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RECOMBINANT METHODS AND MATERIALS FOR PRODUCING 
EPOTHILONE AND EPOTHILONE DERIVATIVES 


Reference to Government Funding 

This invention was supported in part by SBIR grant 1R43-CA79228-01. The U.S. 
government has certain rights in this invention. 


Field of the Invention 

The present invention provides recombinant methods and materials for producing 
epothilone and epothilone derivatives. The invention relates to the fields of agriculture, 
chemistry, medicinal chemistry, medicine, molecular biology, and pharmacology. 


Background of the Invention 

The epothilones were first identified by Gerhard Hofle and colleagues at the 
National Biotechnology Research Institute as an antifungal activity extracted from the 
myxobacterium Sorangium celluloaum (see K. Gerth et al., 1996, J. Antibiotics 49: 560- 
563 and Germany Patent No. DE 41 38 042). The epothilones were later found to have 
activity in a tubulin polymerization assay (see D. Bollag el al., 1995, Cancer Res. 
55:2325-2333) to identify antitumor agents and have since been extensively studied as 
potential antitumor agents for the treatment of cancer. 

The chemical structure of the epothilones produced by Sorangium cellulosum 
strain So ce 90 was described in Hofle et al y 1 996, Epothilone A and B - novel 1 6- 
membered macrolides with cytotoxic activity: isolation, crystal structure, and 
conformation in solution, Angew. Chem. Int. Ed. Engl. 35(13/14): 1567-1569, 
incorporated herein by reference. The strain was found to produce two epothilone 
compounds, designated A (R = H) and B (R= CH 3 ), as shown below, which showed broad 
cyto toxic activity against eukaryotic cells and noticeable activity and selectivity against 
breast and colon tumor cell lines. 
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The desoxy counterparts of epothilones A and B, also known as epothilones C (R - 

H) and D (R = CH 3 ), are known to be less cytotoxic, and the structures of these 
epothilones arc shown below. 



Two other naturally occurring epothilones have been described. These are 
epothilones E and F, in which the methyl side chain of the thiazole moiety of epothilones 
A and B has been hydroxylated to yield epothilones E and F, respectively. 

Because of the potential for use of the epothilones as anticancer agents, and 
because of the low levels of epothilone produced by the native So ce 90 strain, a number 
of research teams undertook the effort to synthesize the epothilones. This effort has been 
successful (see Balog et al., 1996, Total synthesis of (-)-epothilonc A, Angew. Chem. Int. 
Ed. Engl. 35(23/24): 2801-2803; Su et al., 1997, Total synthesis of (-)-epothilone B: an 
extension of the Suzuki coupling method and insights into structure-activity relationships 
of the epothilones, Angew. Chem. Int. Ed. Engl. 36(7): 757-759; Meng et al., 1997, Total 
syntheses of epothilones A and B, JACS 1 1 9(42): 10073-10092; and Balog et al., 1998, A 
novel aldol condensation with 2-meihyl-4-pentenal and its application to an improved total 
synthesis of epothilone B, Angew. Chem. Int. Ed. Engl. 37(19): 2675-2678, each of which 
is incorporated herein by reference). Despite the success of these efforts, the chemical 
synthesis of the epothilones is tedious, time-consuming, and expensive. Indeed, the 
methods have been characterized as impractical for the full-scale pharmaceutical 
development of an epothilone. 

A number of epothilone derivatives, as well as epothilones A - D, have been 
studied in vitro and in vivo (see Su et al., 1997, Structure-activity relationships of the 
epothilones and the first in vivo comparison with paclitaxel, Angew. ry m Int. Ed. Engl. 
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36(19): 2093-2096; and Chou et at., Aug. 1998, Desoxyepothilone B: an efficacious 
microtubule-targeted antitumor agent with a promising in vivo profile relative to 
epothilone B, Proc. Natl. Acad. Sci. USA 95: 9642-9647, each of which is incorporated 
herein by reference). Additional epothilone derivatives and methods for synthesizing 
cpothiJones and epothilone derivatives are described in PCT patent publication Nos. 
99/54330, 99/54319, 99/54318, 99/43653, 99/43320, 99/42602, 99/40047, 99/27890, 
99/07692, 99/02514, 99/01 124,98/25929, 98/22461, 98/08849, and 97/19086; U.S. Patent 

No. 5,969,145; and Germany patent publication No. DE 41 38 042, each of which is 
incorporated herein by reference. 

There remains a need for economical means to produce not only the naturally 

occurring epothilones but also the derivatives or precursors thereof, as well as new 

epothilone derivatives with improved properties. There remains a need for a host cell that 

produces epothilones or epothilone derivatives that is easier to manipulate and ferment 

than the natural producer Sorangium cellulosum . The present invention meets these and 
other needs. 

Summary of the Invention 

In one embodiment, the present invention provides recombinant DNA compounds 

that encode the proteins required to produce epothilones A, B, C, and D. The present 

invention also provides recombinant DNA compounds that encode portions of these 

proteins. The present invention also provides recombinant DNA compounds that encode a 

hybrid protein, which hybrid protein includes all or a portion of a protein involved in 

epothilone biosynthesis and all or a portion of a protein involved in the biosynthesis of 

another polykelide or non-ribosomal-derived peptide. In a preferred embodiment, the 

recombinant DNA compounds of the invention are recombinant DNA cloning vectors that 

facilitate manipulation of the coding sequences or recombinant DNA expression vectors 

(hat code for the expression of one or more of the proteins of the invention in recombinant 
host cells. 

In another embodiment, the present invention provides recombinant host cells that 
produce a desired epothilone or epothilone derivative. In one embodiment, the invention 
provides host cells that produce one or more of the epothilones or epothilone derivatives at 
higher levels than produced in the naturally occurring organisms that produce epothilones. 
In another embodiment, the invention provides host cells that produce mixtures of 
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epothiiones that are less complex than the mixtures produced by naturally occurring host 
cells. In another embodiment, the present invention provides non Sorangium recombinant 
host cells that produce an epothilone or epothilone derivative. 

In a preferred embodiment, the host cells of the invention produce less complex 
mixtures of epothiiones than do naturally occurring cells that produce epothiiones. 
Naturally occurring cells that produce epothiiones typically produce a mixture of 

epothiiones A, B, C, D, E, and F. The table below summarizes the epothiiones produced in 
different illustrative host cells of the invention. 


Cell Type 

Epothiiones Produced 

Epothiiones Not Produced 

1 

A, B, C, D, E, F 


2 

A, C, E 

B.D.F 

3 

b.d.f 

A, C, E 

4 

A, B, C, D 

E,F 

5 

A, C 

b,d,e,f 

6 

C 

A, B, D, E, F 

7 

B,D 

A, C, E, F 

8 

D 

A, B, C, E, F 


In addition, cell types may be constructed which produce only the newly 

discovered epothiiones G and H, further discussed below, and one or the other of G and H 

or both in combination with the downstream epothiiones. Thus, it is understood, based on 

the present invention, that the biosynthetic pathway which relates the naturally occurring 

epothiiones is, respectively, G — > C — ► A — * E and H D — > B — >• F. Appropriate 

enzymes may also convert members of each pathway to the corresponding member of the 
other. 

Thus, the recombinant host cells of the invention also include host cells that 
produce only one desired epothilone or epothilone derivative. 

In another embodiment, the invention provides Soranghtm host cells that have 

been modified genetically to produce epothiiones either at levels greater than those 

observed in naturally occurring host cells or as less complex mixtures of epothiiones than 

produced by naturally occurring host cells, or produce an epothilone derivative that is not 

produced in nature. In a preferred embodiment, the host cell produces the epothiiones at 
equal to or greater than 20 mg/L. 
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In another embodiment, the recombinant host cells of the invention are host cells 

other than Sorangium cellulosum that have been modified genetically to produce an 

epothilone or an epothrlone derivative. In a preferred embodiment, the host cell produces 

the epothilones at equal to or greater than 20 mg/L. In a more preferred embodiment, the 

5 recombinant host cells are Myxococcus, Pseudomonas, or Streptomyces host cells that 

produce the epothilones or an epothilone derivative at equal to or greater than 20 mg/L. 

In another embodiment, the present invention provides novel compounds useful in 

agriculture, veterinary practice, and medicine. In one embodiment, the compounds are 

useful as fungicides. In another embodiment, the compounds are useful in cancer 

1 0 chemotherapy. In a preferred embodiment, the compound is an epothilone derivative that 

is at least as potent against tumor cells as epothilone B or D. In another embodiment, the 

compounds are useful as immunosuppressants. In another embodiment, the compounds are 

useful in the manufacture of another compound. In a preferred embodiment, the 

compounds are formulated in a mixture or solution for administration to a human or 
15 animal. 

These and other embodiments of the invention are described in more detail in the 
following description, the examples, and claims set forth below. 

Brief Description of the Figures 

-0 Figure 1 shows a restriction site map of the insert Sorangium cellulosum genomic 

DNA in four overlapping cosmid clones (designated 8A3, 1 A2, 4, and 85 and 
corresponding to pKOS35-70.8A3, pKOS35-70.1 A2, pKOS35-70.4, and P KOS35-79.85, 
respectively) spanning the epothilone gene cluster. A functional map of the epothilone 
gene cluster is also shown. The loading domain (Loading, epoA), the non-ribosomal 
25 peptide synthase (NRPS, Module 1 , epoB) module, and each module (Modules 2 through 
9, epoC, epoD, epoE, and epoF) of the remaining eight modules of the epothilone synthase 

gene are shown, as is the location of the epoK gene that encodes a cytochrome P450-like 

epoxidation enzyme. 

Figure 2 shows a number of precursor compounds to N-acylcysteamine thioester 
30 derivatives that can be supplied to an epothilone PfCS of the invention in which the NRPS- 
hke module 1 or module 2 KS domain has been inactivated to produce a novel epothilone 
derivati\e. A genera] synthetic procedure for making such compounds is also shown. 
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Figure 3 shows restriction site and function maps of plasmids pKOS35-82.1 and 
pKOS35-82.2. 

Figure 4 shows restriction site and function maps of plasmids pKOS35-154 and 
pKOS90-22. 

Figure 5 shows a schematic of a protocol for introducing the epothilone PKS and 

modification enzyme genes into the chromosome of a Myxococcus xanthus host cell as 
described in Example 3. 

Figure 6 shows restriction site and function maps of plasmids pKOS039-l 24 and 
pKOS039-124R. 

Figure 7 shows a restriction site and function map of plasmid pKOS039-126R. 

Figure 8 shows a restriction site and function map of plasmid pKOS039-141, 

Figure 9 shows a restriction site and function map of plasmid pKOS045-12. 

Detailed Description of the Invention 

The present invention provides the genes and proteins that synthesize the 
cpothilones in Sorangium cellulosum in recombinant and isolated form. As used herein, 
the term recombinant refers to a compound or composition produced by human 
intervention, typically by specific and directed manipulation of a gene or portion thereof. 
The term isolated refers to a compound or composition in a preparation thnt is 
substantially free of contaminating or undesired materials or, with respect to a compound 
or composition found in nature, substantially free of the materials with which that 
compound or composition is associated in its natural state. The epothilones (epothilone A, 
B, C, D, E, and F) and compounds structurally related thereto (epothilone derivatives) are 
potent cytotoxic agents specific for eukaryotic cells. These compounds have application as 
anti-fimgals, cancer chemotherapeutics, and immunosuppressants. The epothilones are 
produced at very low levels in the naturally occurring Sorangium cellulosum cells in 
which they have been identified: Moreover, S. cellulosum is very slow growing, and 
fermentation of & cellulosum strains is difficult and time-consuming. One important 
benefit conferred by the present invention is the ability simply to produce an epothilone or 
epothilone derivative in a non -£ cellulosum host cell. Another advantage of the present 
invention is the ability to produce the epothilones at higher levels and in greater amounts 
in the recombinant host cells provided by the invention than possible in the naturally 
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occurring epothilone producer cells. Yet another advantage is the ability to produce an 
epothilone derivative in a recombinant host cell. 

The isolation of recombinant DNA encoding the epothilone biosynthetic genes 
resulted from the probing of a genomic library of Sorcmgium cellulosum SMP44 DNA. As 
described more fully in Example 1 below, the library was prepared by partially digesting 
S. cellulosum genomic DNA with restriction enzyme SauIIIAl and inserting the DNA 
fragments generated into BamHI-digested Supercos™ cosmid DNA (Stratagene). Cosmid 
clones containing epothilone gene sequences were identified by probing with DNA probes 

specific for sequences from PKS genes and reprobing with secondary probes comprising 
nucleotide sequences identified with the primary probes. 

Four overlapping cosmid clones were identified by this effort. These four cosmids 
were deposited with the American Type Culture Collection (ATCC), Manassas, VA, USA, 
under the terms of the Budapest Treaty, and assigned ATCC accession numbers. The 
clones (and accession numbers) were designated as cosmids pKOS35-70.1 A2 (ATCC 
203782), pKOS35-70.4 (ATCC 203781), pKOS35-70.8A3 (ATCC 203783), and pKOS35- 
79.85 (ATCC 203780). The cosmids contain insert DNA that completely spans the 
epothilone gene cluster. A restriction site map of these cosmids is shown in Figure 1 . 

Figure I also provides a function map of the epothilone gene cluster, showing the location 
of the six epothilone PKS genes and the epuK P450 epoxidase gene. 

The epothilone PKS genes, like other PKS genes, are composed of coding 
sequences organized to encode a loading domain, a number of modules, and a thioesterase 
domain. As described more fully below, each of these domains and modules corresponds 
to a polypeptide with one or more specific functions. Generally, the loading domain is 
responsible for binding the first building block used to synthesize the polyketide and 
transferring it to the first module. The building blocks used to form complex polyketides 
are typically acylthioesters. most commonly acetyl, propionyl, malonyl, methylmalonyl, 
and cthylmalonyl CoA. Other building blocks include amino acid-like acylthioesters. 

PKSs catalyze the biosynthesis of polyketides through repeated, decarboxylative Claisen 
condensations between the acylthioester building blocks. Each module is responsible for 
binding a building block, performing one or more functions on that building block, and 
transferring the resulting compound to the next module. The next module, in turn, is 
responsible for attaching the next building block and transferring the growing compound 
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to the next module until synthesis is complete. At that point, an enzymatic thioesterase 
(TE) activity cleaves the polyketide from the PKS. 

Such modular organization is characteristic of the class of PKS enzymes that 
synthesize complex polyketidcs and is well known in the art. Recombinant methods for 
5 manipulating modular PKS genes are described in U.S. Patent Nos. 5,672,491; 5,71 2,146; 
5,830,750, and 5,843,718; and in PCT patent publication Nos. 98/493 1 5 and 97/02358, 
each of which is incorporated herein by reference. The polyketide known as 6- 
deoxyerythronolide B (6-dEB) is synthesized by a PKS that is a prototypical modular PKS 
enzyme. The genes, known as eryAl, eryAIl, and eryAUI, that code for the multi-subunit 
10 protein known as deoxyerythronolide B synthase or DEBS (each subunit is known as 

DEBS 1, DEBS2, or DEBS3) that synthesizes 6-dEB are described in U.S. Patent Nos. 
5,712,146 and 5,824,513, incorporated herein by reference. 

The loading domain of the DEBS PKS consists of an acyltransferase (AT) and an 
acyl carrier protein (ACP). The AT of the DEBS loading domain recognizes propionyl 
1 5 CoA (other loading domain ATs can recognize other acyl-CoAs, such as acetyl, malonyl, 
methylmalonyl, or butyryl CoA) and transfers it as a thioester to the ACP of the loading 
domain. Concurrently, the AT on each of the six extender modules recognizes a 
methylmalonyl CoA (other extender module ATs can recognize other CoAs, such as 
malonyl or alpha-substituted malonyl CoAs. i.e., malonyl, ethylmalonyl, and 2- 
20 hydroxymalonyl CoA) and transfers it to the ACP of that module to form a thioester. Once 
DEBS is pnmed with acyl- and methylmalonyl-ACPs, the acyl group of the loading 
domain migrates to form a thioester (trans-esterification) at the KS of the first module; at 
this stage, module one possesses an acyl-KS adjacent to a methylmalonyl ACP. The acyl 
group derived from the DEBS loading domain is then covalently attached to the alpha- 
25 carbon of the extender group to form a carbon-carbon bond, driven by concomitant 

decarboxylation, and generating a new acyl-ACP that has a backbone two carbons longer 
than the loading unit (elongation or extension). The growing polyketide chain is 

transferred from the ACP to the KS of the next module of DEBS, and the process 
continues. 

^ P°tyketide chain, growing by two carbons for each module of DEBS, is 

sequentially passed as a covalently bound thioester from module to module, in an 
assembly line-like process. The carbon chain produced by this process alone would 
possess a ketone at every- other carbon atom, producing a polyketonc, from which the 
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name polyketide arises. Commonly, however, additional enzymatic activities modify the 
beta keto group of each two carbon unit just after it has been added to the growing 
polyketide chain but before it is transferred to the next module. Thus, in addition to the 
minimal module containing KS, AT, and ACP necessary to form the carbon-carbon bond, 
modules may contain a ketoreductase (ICR) that reduces the keto group to an alcohol. 
Modules may also contain a KR plus a dehydratase (DH) that dehydrates the alcohol to a 
double bond. Modules may also contain a KR, a DH, and an enoylreductase (ER) that 
converts the double bond to a saturated single bond using the beta carbon as a methylene 
function. The DEDS modules include those with only a KR domain, only an inactive KR 
domain, and with all three KR, DH, and ER domains. 

Once a polyketide chain traverses the final module of a PKS, it encounters the 
releasing domain or thioesterase found at the carboxyl end of most PKSs. Here, the 
polyketide is cleaved from the enzyme and, for most but not all polyketides, cyclized. The 
polyketide can be modified further by tailoring or modification enzymes; these enzymes 
add carbohydrate groups or methyl groups, or make other modifications, i.e., oxidation or 
reduction, on the polyketide core molecule. For example, 6-dEB is hydroxylated, 
methylated, and glycosylated (glycosidated) to yield the well known antibiotic 

erythromycin A in the Saccharopolyspora erythraea cells in which it is produced 
naturally. 

While the above description applies generally to modular PKS enzymes and 
specifically to DEBS, there are a number of variations that exist in nature. For example, 
many PKS enzymes comprise loading domains that, unlike the loading domain of DEBS, 
comprise an inactive KS domain that functions as a decarboxylase. This inactive KS is 
in most instances called KS°, where the superscript is the single-letter abbreviation for the 
amino acid (glutamine) that is present instead of the active site cysteine required for 
ketosynthase activity. The epothilone PKS loading domain contains a KS Y domain not 
present in other PKS enzymes for which amino acid sequence is currently available in 
which the amino acid tyrosine has replaced the cysteine. The present invention provides 
recombinant DNA coding sequences for this novel KS domain. 

Another important variation in PKS enzymes relates to the type of building block 
incorporated. Some polyketides, including epothilone, incorporate an amino acid derived 
building block. PKS enzymes that make such polyketides require specialized modules for 
incorporation. Such modules are called non-ribosomal peptide synthetase (NRPS) 
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modules. The epothilone PKS, for example, contains an NRPS module. Another example 
of a variation relates to additional activities in a module. For example, one module of the 
epothilone PKS contains a methyltransferase (MT) domain, a heretofore unknown domain 
of PKS enzymes that make modular polyketides. 

The complete nucleotide sequence of the coding sequence of the open reading 
frames (ORFs) of the epothilone PKS genes and epothilone tailoring (modification) 
enzyme genes is provided in Example 1 , below. This sequence information together with 
the information provided below regarding the locations of the open reading frames of the 
genes within that sequence provides the amino acid sequence of the encoded proteins. 
Those of skill in the art will recognize that, due to the degenerate nature of the genetic 
code, a variety of DNA compounds differing in their nucleotide sequences can be used to 
encode a given amino acid sequence of the invention. The native DNA sequence encoding 
the epothilone PKS and epothilone modification enzymes of Sorangium cellulosum is 
shown herein merely to illustrate a preferred embodiment of the invention. The present 
invention includes DNA compounds of any sequence that encode the amino acid 
sequences of the polypeptides and proteins of the invention. In similar fashion, a 
polypeptide can typically tolerate one or more amino acid substitutions, deletions, and 
insertions in its amino acid sequence without loss or significant loss of a desired activity 
and, in some instances, even an improvement of a desired activity. The present invention 
includes such polypeptides with alternate amino acid sequences, and the amino acid 
sequences shown merely illustrate prefeired embodiments of the invention. 

T7he present invention provides recombinant genes for the production of 
epothilones. Hie invention is exemplified by the cloning, characterization, and 
manipulation of the epothilone PKS and modification enzymes of Sorangium cellulosum 
SMP44. The description of the invention and the recombinant vectors deposited in 
connection with that description enable the identification, cloning, and manipulation of 
epothilone PKS and modification enzymes from any naturally occurring host cell that 
produces an epothilone. Such host cells include other S. cellulosum strains, such as So ce 
90, other Sorangium species, and non-Sorangium cells. Such identification, cloning, and 
characterization can be conducted by those of ordinary skill in accordance with the present 
invention using standard methodology for identifying homologous DNA sequences and 
for identifying genes that encode a protein of function similar to a known protein. 

Moreover, the present invention provides recombinant epothilone PKS and modification 
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enzyme genes that are synthesized de novo or are assembled from non-epothilone PKS 
genes to provide an ordered array of domains and modules in one or more proteins that 
assemble to form a PKS that produces epothilone or an epothilone derivative 

The recombinant nucleic acids, proteins* and peptides of the invention are many 
5 and diverse. To facilitate an understanding of the invention and the diverse compounds 
and methods provided thereby, the following discussion describes various regions of the 
epothilone PKS and corresponding coding sequences. This discussion begins with a 
general discussion of the genes that encode the PKS, the location of the various domains 
and modules in those genes, and the location of the various domains in those modules. 

10 Then* a more detailed discussion follows, focusing first on the loading domain* followed 
by the NRPS module, and then the remaining eight modules of the epothilone PKS. 

There are six epothilone PKS genes. The epoA gene encodes the 149 kDa loading 
domain (which can also be referred to as a loading module). The cpoB gene encodes 
module 1, the 158 kDa NRPS module. The epoC gene encodes the 193 kDa module 2. The 
1 5 epoD gene encodes a 765 kDa protein that comprises modules 3 through 6, inclusive. The 
epoE gene encodes a 405 kDa protein that comprises modules 7 and 8. The epoFge ne 
encodes a 257 kDa protein that comprises module 9 and the thioesterase domain. 
Immediately downstream of the epoF gene is epoK, the P450 epoxidase gene which 
encodes a 47 kDa protein, followed immediately by the cpoL gene, which may encode a 
20 24 kDa dehydratase. The epoL gene is followed by a number of ORFs that include genes 

believed to encode proteins involved in transport and regulation. 

The sequences of these genes are shown in Example 1 in one contiguous sequence 
or contig of 71,989 nucleotides. This contig also contains two genes that appear to 
originate from a transposon and are identified below as ORF A and ORF B. These two 
25 genes are believed not to be involved in epothilone biosynthesis but could possibly contain 
sequences that function as a promoter or enhancer. The contig also contains more than 12 
additional ORFs, only 12 of which, designated ORF2 through ORF 12 and ORF2 
complement, are identified below. As noted, ORF2 actually is two ORFs, because the 
complement of the strand shown also comprises an ORF. The function of the 
30 corresponding gene product, if any, of these ORFs has not yet been established. The Table 
below provides the location of various open reading frames, module-coding sequences, 
and domain encoding sequences within the contig sequence shown in Example 1. Those of 
skill in the art will recognize, upon consideration of the sequence shown in Example 1, 
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5 

that the actual start locations of several of the genes could differ from the start locations 
shown in the table, because of the presence in frame codons for methionine or valine in 
close proximity to the codon indicated as the start codon. The actual start codon can be 
10 confirmed by amino acid sequencing of the proteins expressed from the genes. 



Start 

Stop 

Comment 


' 3 

992 

transposase gene ORF A, not part of the PKS 

15 

989 

1501 

transposase gene ORF B, not part of the PKS 

1998 

6263 

epoA gene, encodes the loading domain 


2031 

3548 

KS Y of the loading domain 


3621 

4661 

AT of the loading domain 

20 

4917 

5810 

ER of the loading domain, potentially involved in 
formation of the thiazole moiety 


5856 

6155 

ACP of the loading domain 


6260 

10493 

epoB gene, encodes module 1 , the NRPS module 


6620 

6649 

condensation domain C2 of the NRPS module 

25 

6861 

6887 

heterocyclization signature sequence 


6962 

6982 

condensation domain C4 of the NRPS module 


7358 

7366 

condensation domain C7 (partial) of the NRPS 
module 

30 

7898 

7921 

adenylation domain A 1 of the NRPS module 

8261 

8308 

adenylation domain A3 of the NRPS module 


8411 

8422 

adenylation domain A4 of the NRPS module 


8861 

8905 

adenylation domain A6 of the NRPS module 


8966 

8983 

adenylation domain A7 of the NRPS module 

35 

9090 

9179 

adenylation domain A8 of the NRPS module 


9183 

9992 

oxidation region for forming thiazole 


10121 

10138 

Adenylation domain A10 of the NRPS module 


10261 

10306 

Thiolation domain (PC P) of the NRPS module 

40 

10639 

16137 

epoC gene, encodes module 2 


10654 

12033 

KS2, the KS domain of module 2 


12250 

13287 

AT2, the AT domain of module 2 


13327 

13899 

DH2, the DH domain of module 2 

45 

14962 

15756 

KR2, the KR domain of module 2 


15763 

16008 

ACP2, the ACP domain of module 2 


16134 

37907 

epoD gene, encodes modules 3-6 


16425 

17606 

KS3 

50 

17817 

18857 

AT3 
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Start 

Stoj) 

Comment 

19581 

20396 

KR3 

20424 

20642 

ACP3 

20706 

22082 

KS4 

22296 

23336 

AT4 

24069 

24647 

KR4 

24867 

25151 

ACP4 

25203 

26576 

KS5 

26793 

27833 

ATS 

27966 

28574 

DH5 

29433 

30287 

ER5 

30321 

30869 

KR5 

31077 

31373 

ACP5 

31440 

32807 

KS6 

33018 

34067 

AT6 

34107 

34676 

DH6 

35760 

36641 

ER6 

36705 

37256 

KR6 

37470 

37769 

ACP6 

37912 

49308 

epoE gene, encodes modules 7 and 8 

38014 

39375 

KS7 

39589 

40626 

AT7 

41341 

41922 

KR7 

42181 

42423 

ACP7 

42478 

43851 

KS8 

44065 

45102 

AT8 

45262 

45810 

DH (inactive) 

46072 

47172 

MT8, the methyltransferase domain of module 8 

48103 

48636 

KR8, this domain is inactive 

48850 

49149 

ACP8 

49323 

56642 

epoF gene, encodes module 9 and the TE domain 

49416 

50774 

KS9 

50985 

52025 

AT9 

52173 

53414 

DH (inactive) 

54747 

55313 

KR9 

55593 

55805 

ACP9 

55878 

56600 

TE9, the thioesterase domain 

56757 

58016 

cpoK gene, encodes the P450 epoxidase 
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Start 

Stop 

Comment 

58194 

58733 

epoL gene (putative dehydratase) 

59405 

59974 

ORF2 complement, complement of strand shown 

59460 

60249 

ORF2 

60271 

60738 

ORF3, complement of strand shown 

61730 

62647 

ORF4 (putative transporter) 

63725 

64333 

ORF5 

64372 

65643 

ORF6 

66237 

67472 

ORF7 (putative oxidoreductase) 

67572 

68837 

ORF8 (putative oxidoreductase membrane subunit) 

68837 

69373 

ORF9 

69993 

71174 

ORF10 (putative transporter) 

71171 

71542 

ORF11 

71557 

71989 

ORF12 

With this 

overview of the organization and sequence of the epothilone gene cluster, 

one can better appreciate the many different recombinant DNA compounds provided by 


the present invention. 


5 The epothilone PKS is multiprotein complex composed of the gene products of the 

ep°A t epoB, epoC , epoD, epoE , and epoF genes. To confer the ability to produce 
epothi Jones to a host cell, one provides the host cell with the recombinant epoA , epoB, 
epoC, epoD , epoE , and epoF genes of the present invention, and optionally other genes, 
capable of expression in that host cell. Those of skill in the art will appreciate that, while 
1 0 the epothilone and other PKS enzymes may be referred to as a single entity herein, these 
enzymes are typically multisubunit proteins. Thus, one can make a derivative PKS (a PKS 
that differs from a naturally occurring PKS by deletion or mutation) or hybrid PKS (a PKS 
that is composed of portions of two different PKS enzymes) by altering one or more genes 
that encode one or more of the multiple proteins that constitute the PKS. 

^ ^ *^ e P 0S t _ PKS modification or tailoring of epothilone includes multiple steps 

mediated by multiple enzymes. These enzymes are referred to herein as tailoring or 
modification enzymes. Surprisingly, the products of the domains of the epothilone PKS 
predicted to be functional by analysis of the genes that encode them are compounds that 
have not been previously reported. These compounds are referred to herein as epothilones 
20 G and H. Epothilones G and H lack the C-12-C-13 ir-bond of epothilones C and D and the 
C-12-C-13 epoxide of epothilones A and B, having instead a hydrogen and hydroxy] 
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group at C-13, a single bond between C-12 and C-13, and a hydrogen and H or methyl 
group at C-12. These compounds are predicted to result from the epothilone PKS, because 
the DNA and corresponding amino acid sequence for module 4 of the epothilone PKS 
does not appear to include a DH domain. 

5 As described below, however, expression of the epothilone PKS genes cpoA, epoB . 

epot\ epoD % epoE 9 and epoF in certain heterologous host cells that do not express epoK or 
epoL leads to the production of epothilones C and D, which lack the C-13 hydroxyl and 
have a double bond between C-12 and C-13. The dehydration reaction that mediates the 
formation of this double bond may be due to the action of an as yet unrecognized domain 
10 of the epothilone PKS (for example, dehydration could occur in the next module, which 
possesses an active DH domain and could generate a conjugated diene precursor prior to 
its dehydrogenation by an ER domain) or an endogenous enzyme in the heterologous host 
cells (Streptomyccs cochcolor ) in which it was observed. In the latter event, epothilones G 
and H may be produced in Sorangium cellulosum or other host cells and, to be converted 
1 5 to epothilones C and D, by the action of a dehydratase, which may be encoded by the epoL 
gene. In any event, epothilones C and D are convened to epothilones A and B by an 
epoxidase encoded by the epoK gene. Epothilones A and B are converted to epothilones E 
and F by a hydroxylase gene, which may be encoded by one of the ORFs identified above 
or by another gene endogenous to Sorangium cellulosum . Thus, one can produce an 
20 epothilone or epothilone derivative modified as desired in a host cell by providing that 
host cell with one or more of the recombinant modification enzyme genes provided by the 
invention or by utilizing a host cell that naturally expresses (or does not express) the 
modification enzyme. Thus, in general, by utilizing the appropriate host and by 
appropriate inactivation, if desired, of modification enzymes, one may interrupt the 
25 progression of G — * C — * A — ► E or the corresponding downstream processing of 

epothilone H at any desired point; by controlling methylation, one or both of the pathways 
can be selected. 

Thus, the present invention provides a wide variety of recombinant DNA 
compounds and host cells for expressing the naturally occurring epothilones A, B, C, and 
30 D and derivatives thereof. The invention also provides recombinant host cells, particularly 
Sorangium cellulosum host cells that produce epothilone derivatives modified in a manner 
similar to epothilones E and F . Moreover, the invention provides host cells that can 
produce the heretofore unknown epothilones G and H, either by expression of the 
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epothilone PKS genes in host cells that do not express the dehydratase that converts 
epothilones G and H to C and D or by mutating or altering the PKS to abolish the 
dehydratase function, if it is present in the epothilone PKS. 

The macrolide compounds that are products of the PKS cluster can thus be 
modified in various ways. In addition to the modifications described above, the PKS 
products can be glycosylated, hydroxylated, dehydroxyl ated, oxidized, methylated and 
demethylated using appropriate enzymes. Thus, in addition to modifying the product of 
the PKS cluster by altering the number, functionality, or specificity of the modules 
contained in the PKS, additional compounds within the scope of the invention can be 
produced by additional enzyme-catalyzed activity either provided by a host cell in which 
the polyketide synthases arc produced or by modifying these cells to contain additional 

enzymes or by additional in vitro modification using purified enzymes or crude extracts 
or, indeed, by chemical modification. 

The present invention also provides a wide variety of recombinant DNA 
compounds and host cells that make epothilone derivatives. As used herein, the phrase 
“epothilone derivative” refers to a compound that is produced by a recombinant epothilone 
PKS in which at least one domain has been either rendered inactive, mutated to alter its 
catalytic function, or replaced by a domain with a different function or in which a domain 
has been inserted. In any event, the “epothilone derivative PKS" functions to produce a 
compound that differs in structure from a naturally occurring epothilone but retains its ring 
backbone structure and so is called an “epothilone derivative.” To faciliate a better 
understanding of the recombinant DNA compounds and host cells provided by the 
invention, a detailed discussion of the loading domain and each of the modules of the 
epothilone PICS, as well as novel recombinant derivatives thereof, is provided below. 

The loading domain of the epothilone PKS includes an inactive KS domain, KS V , 
an AT domain specific for malonyl CoA (which is believed to be decarboxylaled by the 
KS y domain to yield an acetyl group), and an ACP domain. The present invention 
provides recombinant DNA compounds that encode the epothilone loading domain. The 
loading domain coding sequence is contained within an -8.3 kb EcoRI restriction 
fragment of cosniid pKOS35-70.8A3. The KS domain is referred to os inactive, because 
the active site region TAYSSSL of the KS domain of the loading domain has a Y 
residue in place of the cysteine required for ketosynthase activity; this domain does have 
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decarboxylase activity. See Witkowski et al , 7 Sep. 1999, Biochem. 38(36): 11643- 
11650, incorporated herein by reference. 

The presence of the Y residue in place of a Q residue (which occurs typically in an 
inactive loading domain KS) may make the KS domain less efficient at decarboxylation. 
The present invention provides a recombinant epothilone PKS loading domain and 
corresponding DNA sequences that encode an epothilone PKS loading domain in which 
the Y residue has been changed to a Q residue by changing the codon therefor in the 
coding sequence of the loading domain. The present invention also provides recombinant 
PKS enzymes comprising such loading domains and host cells for producing such 
enzymes and the polyketides produced thereby. These recombinant loading domains 
include those in which just the Y residue has been changed, those in which amino acids 
surrounding and including the Y domain have been changed, and those in which the 
complete KS Y domain has been replaced by a complete KS Q domain. The latter 
embodiment includes but is not limited to a recomhinant epothilone loading domain in 
which the KS V domain has been replaced by the KS° domain of the oleandolide PKS or 
the narbonolide PKS (see the references cited below in connection with the oleandomycin, 
narbomycin, and picromycin PKS and modification enzymes). 

The epothilone loading domain also contains an AT domain believed to bind 
malonyl CoA. The sequence “QTAFTQPALFTFEYALAALW. . . GHSIG” in the AT 
domain is consistent with malonyl CoA specificity. As noted above, the malonyl CoA is 
believed to be decarboxylated by the KS V domain to yield acetyl CoA. The present 
invention provides recombinant epothilone derivative loading domains or their encoding 
DNA sequences in which the malonyl specific AT domain or its encoding sequence has 
been changed to another specificity, such as methylmalonyl CoA, ethylmalonyl CoA, and 
2-hydroxymalonyl CoA. When expressed with the other proteins of the epothilone PKS, 
such loading domains lead to the production of epothilones in which the methyl 
substituent of the thiazole ring of epothilone is replaced with, respectively, ethyl, propyl, 
and hydroxymethyl. The present invention provides recombinant PKS enzymes 
comprising such loading domains and host cells for producing such enzymes and the 
polyketides produced thereby. 

Those of skill in the an will recognize that an AT domain that is specific for 2- 
hydroxy malonyl CoA will result in a polyketide with a hydroxyl group at the 
corresponding location in the polyketide produced, and that the hydroxyl group can be 
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methylated to yield a methoxy group by polykctide modification enzymes. See, e.g., the 
patent applications cited in connection with the FK-520 PKS in the table below. 
Consequently, reference to a PKS that has a 2-hvdroxymalonyl specific AT domain herein 
similarly refers to polyketides produced by that PKS that have either a hydroxyl or 
5 methoxyl group at the corresponding Location in the polyketide. 

The loading domain of the epothilone PKS also comprises an ER domain. While, 
this ER domain may be involved in forming one of the double bonds in the thiazole 
moiety in epothilone (in the reverse of its normal reaction), or it may be non-functional. In 
either event, the invention provides recombinant DNA compounds that encode the 
1 0 epothilone PKS loading domain with and without the ER region, as well as hybrid loading 
domains that contain an ER domain from another PKS (either active or inactive, with or 
without accompanying KR and DH domains) in place of the ER domain of the epothilone 
loading domain. The present invention also provides recombinant PKS enzymes 
comprising such loading domains and host cells for producing such enzymes and the 
1 5 polyketides produced thereby. 

The recombinant nucleic acid compounds of the invention that encode the loading 
domain of the epothilone PKS and the corresponding polypeptides encoded thereby are 
useful for a variety of applications. In one embodiment, a DNA compound comprising a 
sequence that encodes the epothilone loading domain is cocxprcssed with the proteins of a 
20 heterologous PKS. As used herein, reference to a heterologous modular PKS (or to the 
coding sequence therefor) refers to all or part of a PKS, including each of the multiple 
proteins constituting the PKS, that synthesizes a polyketide other than an epothilone or 
epothilone derivative (or to the coding sequences therefor). This coexpression can be in 
one of two forms. The epothilone loading domain can be coexpressed as a discrete protein 
25 with the other proteins of the heterologous PKS or as a fusion protein in which the loading 
domain is fused to one or more modules of the heterologous PKS. In either event, the 
hybrid PKS formed, in which the loading domain of the heterologous PKS is replaced by 
the epothilone loading domain, provides a novel PKS. Examples of a heterologous PKS 
that can be used to prepare such hybrid PKS enzymes of the invention include but arc not 
30 limited to DEBS and the picromycin (narbonolide), oleandolide, rapamycin, FK-506, FK- 
520, rifamycin, and avermectin PKS enzymes and their corresponding coding sequences. 

In another embodiment, a nucleic acid compound comprising a sequence that 
encodes the epothilone loading domain is coexpressed with the proteins that constitute the 
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remainder of the epothilone PKS (i.e., the epoB, epoC, cpoD , epoE, and epoF gene 
products) or a recombinant epothilone PKS that produces an epothilone derivative due to 
an alteration or mutation in one or more of the epoB, epoC, epoD, epoE, and epoF genes. 
As used herein, reference to an epothilone or a PKS that produces an epothilone derivative 
(or to the coding sequence therefor) refers to all or any one of the proteins that comprise 
the PKS (or to the coding sequences therefor). 

In another embodiment, the invention provides recombinant nucleic acid 
compounds that encode a loading domain composed of part of the epothilone loading 
domain and part of a heterologous PKS. In this embodiment, the invention provides, for 
example, either replacing the malonyl CoA specific AT with a methylmalonyl CoA, 
ethyl malonyl CoA, or 2-hydroxymalonyl CoA specific AT. This replacement, like the 
others described herein, is typically mediated by replacing the coding sequences therefor 
to provide a recombinant DNA compound of the invention; the recombinant DNA is used 
to prepare the corresponding protein. Such changes (including not only replacements but 
also deletions and insertions) may be referred to herein either at the DNA or protein level. 

The compounds of the invention also include those in which both the KS V and AT 
domains of the epothilone loading domain have been replaced but the ACP and/or linker 
regions of the epothilone loading domain are left intact. Linker regions are those segments 
of amino acids between domains in the loading domain and modules of a PKS that help 
form the tertiary structure of the protein and arc involved in correct alignment and 
positioning of the domains of a PKS. These compounds include, for example, a 
recombinant loading domain coding sequence in which the KS Y and AT domain coding 
sequences of the epothilone PKS have been replaced by the coding sequences for the KS^ 
and AT domains of, for example, the oleandolide PKS or the narbonolide PKS. There are 
also PKS enzymes that do not employ a KS Q domain but instead merely utilize an AT 
domain that binds acetyl CoA, propinny] CoA, or butyryl CoA (the DEBS loading 
domain) or isobutyryl CoA (the avermectin loading domain). Thus, the compounds of the 
invention also include, for example, a recombinant loading domain coding sequence in 
which the KS Y and AT domain coding sequences of the epothilone PKS have been 
replaced by an AT domain of the DEBS or avermectin PKS. The present invention also 
provides recombinant DNA compounds encoding loading domains in which the ACP 

domain or any of the linker regions of the epothilone loading domain has been replaced by 
another ACP or linker region. 
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Any of the above loading domain coding sequences is coexpressed with the other 
proteins that constitute a PKS that synthesizes epothilone, an epothilone derivative, or 
another polyketide to provide a PKS of the invention. If the product desired is epothilone 
or an epothilone derivative, then the loading domain coding sequence is typically 
5 expressed as a discrete protein, as is the loading domain in the naturally occurring 

epothilone PKS. If the product desired is produced by the loading domain of the invention 
and proteins from one or more non-epothilone PKS enzymes, then the loading domain is 

expressed either as a discrete protein or as a fusion protein with one or more modules of 
the heterologous PKS. 

1 0 The present invention also provides hybrid PKS enzymes in which the epothilone 

loading domain has been replaced in its entirety by a loading domain from a heterologous 
PKS with the remainder of the PKS proteins provided by modified or unmodified 
epothilone PKS proteins. The present invention also provides recombinant expression 
vectors and host cells for producing such enzymes and the polyketides produced thereby. 

1 5 In one embodiment, the heterologous loading domain is expressed as a discrete protein in 
a host cell that expresses the epoB , epoC 9 epoD, epoE t and epoF gene products. In another 
embodiment, the heterologous loading domain is expressed as a fusion protein with the 
epoB gene product in a host cell that expresses the epoC, epoD, epoE. and epoF gene 
products. In a related embodiment, the present invention provides recombinant epothilone 
20 PKS enzymes in which the loading domain has been deleted and replaced by an NRPS 
module and corresponding recombinant DNA compounds and expression vectors, fn this 
embodiment, the recombinant PKS enzymes thus produce an epothilone derivative that 
comprises a dipeptide moiety, as in the compound Ieinamycin. The invention provides 
such enzymes in which the remainder of the epothilone PKS is identical in function to the 
25 native epothilone PKS as well as those in which the remainder is a recombinant PKS that 
produces an epothilone derivative of the invention. 

The present invention also provides reagents and methods useful in deleting the 
loading domain coding sequence or any portion thereof from the chromosome of a host 
cell, such as Sorangium cellulosum , or replacing those sequences or any portion thereof 
30 with sequences encoding a recombinant loading domain. Using a recombinant vector that 
comprises DNA complementary to the DNA including and/or flanking the loading domain 
coding sequence in the Sorangium chromosome, one can employ the vector and 
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homologous recombination to replace the native loading domain coding sequence with a 
recombinant loading domain coding sequence or to delete the sequence altogether. 

Moreover, while the above discussion focuses on deleting or replacing the 
epothilonc loading domain coding sequences, those of skill in the art will recognize that 
the present invention provides recombinant DNA compounds, vectors, and methods useful 
in deleting or replacing all or any portion of an epothilone PKS gene or an epothilone 
modification enzyme gene. Such methods and materials are useful for a variety of 
purposes. One purpose is to construct a host cell that does not make a naturally occurring 
epothilone or epothilonc derivative. For example, a host cell that has been modified to not 
produce a naturally occurring epothilone may be particularly preferred for making 
epothilone derivatives or other polyketides free of any naturally occurring epothilone. 
Another purpose is to replace ihe deleted gene with a gene that has been altered so as to 
provide a different product or to produce more of one product than another. 

If the epothilone loading domain coding sequence has been deleted or otherwise 
rendered non-functional in a Sorangium cellulosum host cell, then the resulting host cell 
will produce a non-functional epothilone PKS. This PKS could still bind and process 
extender units, but the thiazole moiety of epothilone would not form, leading to the 
production of a novel epothilone derivative. Because this derivative would predictably 
contain a free amino group, it would be produced at most in low quantities. As noted 
above, however, provision of a heterologous or other recombinant loading domain to the 
host cell would result in the production of an epothilonc derivative with a structure 
determined by the loading domain provided. 

The loading domain of the epothilone PKS is followed by the first module of the 
PKS, which is an NRPS module specific for cysteine. This NRPS module is naturally 
expressed as a discrete protein, the product of the epoB gene. The present invention 
provides the epoB gene in recombinant form. The recombinant nucleic acid compounds of 
the invention that encode the NRPS module of the epothilone PKS and the corresponding 
polypeptides encoded thereby are useful for a variety of applications. In one embodiment, 
a nucleic acid compound comprising a sequence that encodes the epothilone NRPS 
module is coexpressed with genes encoding one or more proteins of a heterologous PKS. 

The NRPS module can be expressed as a discrete protein or as a fusion protein with one of 
the proteins of the heterologous PKS. The resulting PKS, in which at least a module of the 
heterologous PKS is replaced by the epothilone NRPS module or the NRPS module is in 
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effect added os a module to the heterologous PKS, provides a novel PICS. In another 
embodiment, a DNA compound comprising a sequence that encodes the epothilone NRPS 
module is coexpressed with the other epothilone PKS proteins or modified versions 
thereof to provide a recombinant epothilone PKS that produces an epothilone or an 
epothilone derivative. 


Two hybrid PKS enzymes provided by the invention illustrate this aspect. Both 
hybrid PKS enzymes are hybrids of DEBS and the epothilone NRPS module. The first 
hybrid PKS is composed of four proteins: (i) DEBS 1 ; (ii) a fusion protein composed of the 
KS domain of module 3 of DEBS and all but the KS domain of the loading domain of the 
epothilone PKS, (iii) the epothilone NRPS module; and (iv) a fusion protein composed of 
the KS domain of module 2 of the epothilone PKS fused to the AT domain of module 5 of 
DEBS and the rest of DEBS3. This hybrid PKS produces a novel polyketide with a 
thiazole moiety incorporated into the macrolactone ring and a molecular weight of 413.53 
when expressed in Sireptomyces coelicolor. Glycosylated, hydroxylated, and methylated 

derivatives can be produced by expression of the hybrid PKS in Saccharopolyspora 

erythraea. 

Diagrammatical ly, the construct is represented; 


DEBS1 


DEBS 



r* — epo — to-j 

h«- DEBS 

— ►- 

KS3 

ATo NRPS KS2 

1 

AT5 



20 


The structure of the product is: 
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The second hybrid PKS illustrating this aspect of the invention is composed of five 
proteins: (i) DEBS1; (ii) a fusion protein composed of the KS domain of module 3 of 
DEBS and all but the KS domain of the loading domain of the epothilone PKS; (iii) the 
25 epothilone NRPS module; and (iv) a fusion protein composed of the KS domain of module 


55 


WO 00/31247 


-23- 


PCT/US9 9/2743 8 


2 of the epothilone PKS fused to the AT domain of module 4 of DEBS and the rest of 
DEBS2; and (v) DEBS3. This hybrid PKS produces a novel polyketide with a ihiazole 
moiety incorporated into the macrolactone ring and a molecular weight of 455.61 when 
expressed in Sireptomyces coelicolor. Glycosylated, hydroxylated, and methylated 

derivatives can be produced by expression of the hybrid PKS in Saccharopolyspora 
erythraea. 


Diagrammatically, the construct is represented: 


DEBS 

— 

r*— epo 

1 


ncoo 

DEBS1 

KS3 

ATo NRPS 

_JfS£| 

AT4 ^ KS2\ 

UCOo 

ATS 


The structure of the product is: 



In another embodiment, a portion of the NRPS module coding sequence is utilized 
in conjunction with a heterologous coding sequence. In this embodiment, the invention 
provides, for example, changing the specificity of the NRPS module of the epothilone 
PKS from a cysteine to another amino acid. This change is accomplished by constructing a 
coding sequence in which all or a portion of the epothilone PKS NRPS module coding 
sequences have been replaced by those coding for an NRPS module of a different 
specificity. In one illustrative embodiment, the specificity of the epothilone NRPS module 
is changed from cysteine to serine or threonine. When the thus modified NRPS module is 
expressed with the other proteins of the epothilone PKS, the recombinant PKS produces 
an epothilone derivative in which the thiazole moiety of epothilone (or an epothilone 
derivative) is changed to an oxazole or 5-methyloxazole moiety, respectively. 

Alternatively, the present invention provides recombinant PKS enzymes composed of the 
products of the epoA, epoC, epoD, epoE , and epoF genes (or modified versions thereof) 
without an NRPS module or with an NRPS module from a heterologous PKS. The 

heterologous NRPS module can be expressed as a discrete protein or as a fusion protein 
with either the epoA or epoC genes. 
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The invention also provides methods and reagents useful in changing the 
specificity of a heterologous NRPS module from another amino acid to cysteine. This 
change is accomplished by constructing a coding sequence in which the sequences that 
determine the specificity of the heterologous NRTS module have been replaced by those 
that specify cysteine from the epolhilone NRPS module coding sequence. The resulting 
heterologous NRPS module is typically coexpressed in conjunction with the proteins 
constituting a heterologous PKS that synthesizes a polyketide other than cpothilone or an 
epnthilone derivative, although the heterologous NRPS module can also be used to 
produce epothilone or an epothilonc derivative. 

In another embodiment, the invention provides recombinant epothilone PKS 
enzymes and corresponding recombinant nucleic acid compounds and vectors in which the 
NRPS module has been inactivated or deleted. Such enzymes, compounds, and vectors are 
constructed generally in accordance with the teaching for deleting or inactivating the 
epothilone PKS or modification enzyme genes above. Inactive NRPS module proteins and 
the coding sequences therefore provided by the invention include those in which the 
peptidyl carrier protein (PCP) domain has been wholly or partially deleted or otherwise 
rendered inactive by changing the active site serine (the site for phosphopanteiheinylation) 
to another amino acid, such as alanine, or the adenylation domains have been deleted or 
otherwise rendered inactive. In one embodiment, both the loading domain and the NRPS 
have been deleted or rendered inactive. In any event, the resulting epothilone PKS can 
then function only if provided a substrate that binds to the KS domain of module 2 (or a 
subsequent module) of the epothilone PKS or a PKS for an epothilone derivative. In a 
method provided by the invention, the thus modified cells are then fed activated 
acylthioesters that are bound by preferably the second, but potentially any subsequent, 
module and processed into novel epothilone derivatives. 

Thus, in one embodiment, the invention provides Sorangium and non-Sorangium 
host cells that express an epothilone PKS (or a PKS that produces an epothilone 
derivative) with an inactive NRPS. The host cell is fed activated acylthioesters to produce 
novel epothilone derivatives of the invention. The host cells expressing, or cell free 
extracts containing, the PKS can be fed or supplied with N-acylcysteamine thioesters 
(NACS) of novel precursor molecules to prepare epothilone derivatives. See U.S. patent 
application Serial No. 60/1 17,384, filed 27 Jan. 1999. and PCT patent publication No. 
US99/03986, both of which are incorporated herein by reference, and Example 6, below. 
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The second (first non-NRPS) module of the epothilone PKS includes a KS, an AT 
specific for mcthylmalonyl CoA, a DH, a KR, and an ACP. This module is encoded by a 
sequence within an -13.1 kb EcoRI-Nsil restriction fragment of cosrnid pKOS3 5-70.8 A3. 

The recombinant nucleic acid compounds of the invention that encode the second 
module of the epothilone PKS and the corresponding polypeptides encoded thereby are 
useful for a variety of applications. The second module of the epothilone PKS is produced 
as a discrete protein by the epoC gene. The present invention provides the epoC gene in 
recombinant form. In one embodiment, a DNA compound comprising a sequence that 
encodes the epothilone second module is coexpressed with the proteins constituting a 
heterologous PKS either as a discrete protein or as a fusion protein with one or more 
modules of the heterologous PKS. The resulting PKS, in which a module of the 
heterologous PKS is cither replaced by the second module of the epothilone PKS or the 
latter is merely added to the modules of the heterologous PKS, provides a novel PKS. In 
another embodiment, a DNA compound comprising a sequence that encodes the second 
module of the epothilone PKS is coexpressed with the other proteins constituting the 
epothilone PKS or a recombinant epothilone PKS that produces an epothilone derivative. 

In another embodiment, all or only a portion of the second module coding 
sequence is utilized in conjunction with other PKS coding sequences to create a hybrid 
module. In this embodiment, the invention provides, for example, either replacing the 
mcthylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl CoA, or 2- 
hydroxymalonyl CoA specific AT; deleting either the DH or KR or both; replacing the DH 
or KR or both with a DH or KR or both that specify a different stereochemistry; and/or 
inserting an ER. Generally, any reference herein to inserting or replacing a PKS KR, DH, 
and/or ER domain includes the replacement of the associated KR, DH, or ER domains in 
that module, typically with corresponding domains from the module from which the 
inserted or replacing domain is obtained. In addition, the KS and/or ACP can be replaced 
with another KS and/or ACP. In each of these replacements or insertions, the heterologous 
KS, AT, DH, KR, ER, or ACP coding sequence can originate from a coding sequence for 
another module of the epothilone PKS, from a gene for a PKS that produces a polyketide 
other than epothilone, or from chemical synthesis. The resulting heterologous second 
module coding sequence can be coexpressed with the other proteins that constitute a PKS 
that synthesizes epothilone, an epothilone derivative, or another polyketide. Alternatively, 
one can delete or replace the second module of the epothilone PKS with a module from a 
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heterologous PKS, which can be expressed as a discrete protein or as a fusion protein 
fused to either the epoB or epoD gene product. 

Illustrative recombinant PKS genes of the invention include those in which the AT 
domain encoding sequences for the second module of the epothilone PKS have been 
altered or replaced to change the AT domain encoded thereby from a methylmalonyl 
specific AT to a malonyl specific AT. Such malonyl specific AT domain encoding nucleic 
acids can be isolated, for example and without limitation, from the PKS genes encoding 
the narbonolide PKS, the rapamycin PKS (i.e., modules 2 and 12), and the FK-520 PKS 
(i.e., modules 3, 7, and 8). When such a hybrid second module is coexpressed with the 
other proteins constituting the epothilone PKS, the resulting epothilone derivative 
produced is a 16-desmethyl epothilone derivative. 

In addition, the invention provides DNA compounds and vectors encoding 
recombinant epothilone PKS enzymes and the corresponding recombinant proteins in 
which the KS domain of the second (or subsequent) module has been inactivated or 
deleted. In a preferred embodiment, this inactivation is accomplished by changing the 
codon for the active site cysteine to an alanine codon. As with the corresponding variants 
described above for the NRPS module, the resulting recombinant epothilone PKS enzymes 
are unable to produce an epothilone or epothilone derivative unless supplied a precursor 
that can be bound and extended by the remaining domains and modules of the 
recombinant PKS enzyme. Illustrative diketides are described in Example 6, below. 

The third module of the epothilone PKS includes a KS, an AT specific for malonyl 
CoA, a KR, and an ACP. This module is encoded by a sequence within an -8 kb Bgill- 
Nsil restriction fragment of cosmid pKOS35-70.8A3. 

The recombinant DNA compounds of the invention that encode the third module 
of the epothilone PKS and the corresponding polypeptides encoded thereby are useful for 
a variety of applications. The third module of the epothilone PKS is expressed in a protein, 
the product of the epoD gene, which also contains modules 4, 5, and 6. The present 
invention provides the epoD gene in recombinant form. The present invention also 
provides recombinant DNA compounds that encode each of the epothilone PKS modules 
3, 4, 5, and 6, as discrete coding sequences without coding sequences for the other 
epothilone modules. In one embodiment, a DNA compound comprising a sequence that 
encodes the epothilone third module is coexpressed with proteins constituting a 
heterologous PKS. The third module of the epothilone PKS can be expressed either as a 
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discrete protein or as a fusion protein fused to one or more modules of the heterologous 
PKS. The resulting PKS, in which a module of the heterologous PKS is either replaced by 
that for the third module of the epothilone PKS or the latter is merely added to the 
modules of the heterologous PKS, provides a novel PKS. In another embodiment, a DNA 
compound comprising a sequence that encodes the third module of the epothilone PKS is 
coexpressed with proteins comprising the remainder of the epothilone PKS or a 
recombinant epothilone PKS that produces an epothilone derivative, typically as a protein 
comprising not only the third but also the fourth, fifth, and sixth modules. 

In another embodiment, all or a portion of the third module coding sequence is 
utilized m conjunction with other PKS coding sequences to create a hybrid module. In this 
embodiment, the invention provides, for example, either replacing the malonyl CoA 
specific AT with a methylmalonyl CoA, ethylmalonyl CoA, or 2-hydroxymaIonyl CoA 
specific AT; deleting the KR; replacing the KR with a KR that specifies a different 
stereochemistry; and/or inserting a DH or a DH and an ER. As above, the reference to 
inserting a DH or a DH and an ER includes the replacement of the KR with a DH and KR 
or an ER, DH, and KR. In addition, the KS and/or ACP can be replaced with another KS 
and/or ACP. In each of these replacements or insertions, the heterologous KS, AT, DH, 
KR, ER, or ACP coding sequence can originate from a coding sequence for another 
module of the epothilone PKS, from a coding sequence for a PKS that produces a 
polyketide other than epothilone, or from chemical synthesis. The resulting heterologous 
third module coding sequence can be utilized in conjunction with a coding sequence for a 
PKS that synthesizes epothilone, an epothilone derivative, or another polyketide. 

Illustrative recombinant PKS genes of the invention include those in which the AT 
domain encoding sequences for the third module of the epothilone PKS have been altered 
or replaced to change the AT domain encoded thereby from a malonyl specific AT to a 
methylmalonyl specific AT. Such methylmalonyl specific AT domain encoding nucleic 
acids can be isolated, for example and without limitation, from the PKS genes encoding 
DEBS, the narbonolide PKS, the rapamycin PKS, and the FK-520 PKS. When 
coexpressed with the remaining modules and proteins of the epothilone PKS or an 

epothilone PKS derivative, the recombinant PKS produces the 14-methyl epothilone 
derivatives of the invention. 

Those of skill in the art will recognize that the KR domain of the third module of 
the PKS is responsible for forming the hydroxyl group involved in cyclization of 
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epothilone. Consequently, abolishing the FCR domain of the third module or adding a DH 
or DH and ER domains will interfere with the cyclization, leading either to a linear 
molecule or to a molecule cyclized at a different location than is epothilone. 

The fourth module of the epothilone PKS includes a KS, an AT that can bind cither 
malonyl CoA or methylmalonyl CoA, a KR, and an ACP. This module is encoded by a 
sequence within an -10 kb Nsil-Hindlll restriction fragment of cosmid pKOS35-70.1A2. 

The recombinant DNA compounds of the invention that encode the fourth module 
of the epothilone PKS and the corresponding polypeptides encoded thereby are useful for 
a variety of applications. In one embodiment, a DNA compound comprising a sequence 
that encodes the epothilone fourth module is inserted into a DNA compound that 
comprises the coding sequence for one or more modules of a heterologous PKS. The 
resulting construct encodes a protein in which a module of the heterologous PKS is either 
replaced by that for the fourth module of the epothilone PKS or the latter is merely added 
to the modules of the heterologous PKS. Together with other proteins that constitute the 
heterologous PKS, this protein provides a novel PKS. In another embodiment, a DNA 
compound comprising a sequence that encodes the fourth module of the epothilone PKS is 
expressed in a host cell that also expresses the remaining modules and proteins of the 
epothilone PKS or a recombinant epothilone PKS that produces an epothilone derivative. 
For making epothilone or epothilone derivatives, the recombinant fourth module is usually 
expressed in a protein that also contains the epothilone third, fifth, and sixth modules or 
modified versions thereof. 

In another embodiment, all or a portion of the fourth module coding sequence is 
utilized in conjunction with other PKS coding sequences to create a hybrid module. In this 
embodiment, the invention provides, for example, either replacing the malonyl CoA and 
methylmalonyl specific AT with a malonyl CoA, methylmalonyl CoA, ethylmalonyl CoA, 
or 2-hydroxymalonyl CoA specific AT; deleting the KR; and/or replacing the KR, 
including, optionally, to specify a different stereochemistry; and/or inserting a DH or a DH 
and ER. In addition, the KS and/or ACP can be replaced with another KS and/or ACP. In 
each of these replacements or insertions, the heterologous KS, AT, DH, KR, ER, or ACP 
coding sequence can originate from a coding sequence for another module of the 
epothilone PKS, from a gene for n PKS that produces a polyketide other than epothilone, 
or from chemical synthesis. The resulting heterologous fourth module coding sequence is 
incorporated into a protein subunit of a recombinant PKS that synthesizes epothilone, an 
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epothilone derivative, or another polyketide. If the desired polyketide is an epothilone or 
epolhilone derivative, the recombinant fourth module is typically expressed as a protein 
that also contains the third, fifth, and sixth modules of the epothilone PKS or modified 
versions thereof. Alternatively, the invention provides recombinant PKS enzymes for 
epothilones and epothilone derivatives in which the entire fourth module has been deleted 
or replaced by a module from a heterologous PKS. 

In a preferred embodiment, the invention provides recombinant DNA compounds 
comprising the coding sequence for the fourth module of the epothilone PKS modified to 
encode an AT that hinds methylmalonyl CoA and not malonyl CoA. These recombinant 
molecules are used to express a protein that is a recombinant derivative of the epoD 
protein that comprises the modified fourth module as well as modules 3, 5, and 6, any one 
or more of which can optionally be in derivative form, of the epothilone PKS. In another 
preferred embodiment, the invention provides recombinant DNA compounds comprising 
the coding sequence for the fourth module of the epothilone PKS modified to encode an 
AT that binds malonyl CoA and not methylmalonyl CoA. These recombinant molecules 
are used to express a protein that is a recombinant derivative of the epoD protein that 
comprises the modified fourth module as well as modules 3, 5, and 6, any one or more of 
which can optionally be in derivative form, of the epothilone PKS. 

Prior to the present invention, it was known that Sorangium cellulosum produced 
epothilones A, B, C, D, E, and F and that epothilones A, C, and E had a hydrogen at C-12, 
while epothilones B, D, and F had a methyl group at this position. Unappreciated prior to 
the present invention was the order in which these compounds were synthesized in 
S. cellulosum , and the mechanism by which some of the compounds had a hydrogen at C- 
1 2 where others had a methyl group at this position. The present disclosure reveals that 
epothilones A and B are derived from epothilones C and D by action of the epoK gene 
product and that the presence of a hydrogen or methyl moiety at C-l 2 is due to the AT 
domain of module 4 of the epothilone PKS. This domain can bind either malonyl or 
methylmalonyl CoA and, consistent with its having greater similarity to malonyl specific 

AT domains than to methylmalonyl specific AT domains, binds malonyl CoA more often 
than methylmalonyl CoA. 

Thus, the invention provides recombinant DNA compounds and expression vectors 
and the corresponding recombinant PKS in which the hybrid fourth module with a 
methylmalonyl specific AT has been incorporated. The methylmalonyl specific AT coding 
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sequence can originate, for example and without limitation, from coding sequences for the 
oleandolide PKS, DEBS, the narbonolide PKS, the rapamycin PKS, or any other PKS that 
comprises a methylmalonyl specific AT domain. In accordance with the invention, the 
hybrid fourth module expressed from this coding sequence is incorporated into the 
epothilone PKS (or the PKS for an epothilonc derivative), typically as a derivative epoD 
gene product The resulting recombinant epothilone PKS produces epothilones with a 
methyl moiety at C-12, i.e., epothilone H (or an epothilone H derivative) if there is no 
dehydratase activity to form the C-12-C-13 alkene; epothilone D (or an epothilone D 
derivative), if the dehydratase activity but not the epoxidase activity is present; epothilone 
B (or an epothilone B derivative), if both the dehydratase and epoxidase activity but not 
the hydroxylase activity are present; and epothilone F (or an epothilone F derivative), if all 
three dehydratase, epoxidase, and hydroxylase activities arc present. As indicated 
parenthetically above, the cell will produce the corresponding epothilone derivative if 
there have been other changes to the epothilone PKS. 

If the recombinant PKS comprising the hybrid methylmalonyl specific fourth 
module is expressed in, for example, Sorangium cellulosum, the appropriate modifying 
enzymes are present (unless they have been rendered inactive in accordance with the 
methods herein), and epothilones D, B, and/or F are produced. Such production is 
typically carried out in a recombinant S. cellulosum provided by the present invention in 
which the native epothilone PKS is unable to function at all or unable to function except in 
conjunction with the recombinant fourth module provided. In an illustrative example, one 
can use the methods and reagents of the invention to render inactive the epoD gene in the 
native host. Then, one can transform that host with a vector comprising the recombinant 
epoD gene containing the hybrid fourth module coding sequence. The recombinant vector 
can exist as an extrachromosomal element or as a segment of DNA integrated into the host 
cell chromosome. In the latter embodiment, the invention provides that one can simply 
integrate the recombinant methylmalonyl specific module 4 coding sequence into wild- 
type S. cellulosum by homologous recombination with the native epoD gene to ensure that 
only the desired epothilone is produced. The invention provides that the S. cellulosum host 
can either express or not express (by mutation or homologous recombination of the native 
genes therefor) the dehydratase, epoxidase, and/or oxidase gene products and thus form or 
not form the corresponding epothilone D, B, and F compounds, as the practitioner elects. 
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Sorangium cellulosum modified as described above is only one of the recombinant 
host cells provided by the invention. In a preferred embodiment, the recombinant 
methylmalonyl specific epothilone fourth module coding sequences are used in 
accordance with the methods of invention to produce epothilone D, B, and F (or their 
5 corresponding derivatives) in heterologous host cells. Thus, the invention provides 
reagents and methods for introducing the epothilone or epothilone derivative PKS and 

epothilone dehydratase, epoxidase, and hydroxylase genes and combinations thereof into 
heterologous host cells. 

The recombinant methylmalonyl specific epothilone fourth module coding 
1 0 sequences provided by the invention afford important alternative methods for producing 
desired epothilone compounds in host cells. Thus, the invention provides a hybrid fourth 
module coding sequence in which, in addition to the replacement of the endogenous AT 
coding sequence with a coding sequence for an AT specific for methylmalonyl Co A, 
coding sequences for a DH and KR for, for example and without limitation, module 10 of 
1 5 the rapamycin PKS or modules 1 or 5 of the FK-520 PKS have replaced the endogenous 
KR coding sequences. When the gene product comprising the hybrid fourth module and 
epothilone PKS modules 3,5, and 6 (or derivatives thereof) encoded by this coding 
sequence is incorporated into a PKS comprising the other epothilone PKS proteins (or 
derivatives thereof) produced in a host cell, the cell makes either epothilone D or its trans 
20 stereoisomer (or derivatives thereof), depending on the stereochemical specificity of the 
inserted DH and KR domains. 

Similarly, and as noted above, the invention provides recombinant DNA 
compounds comprising the coding sequence for the fourth module of the epothilone PKS 
modified to encode an AT that binds malonyl CoA and not methylmalonyl CoA. The 
25 invention provides recombinant DNA compounds and vectors and the corresponding 
recombinant PKS in which this hybrid fourth module has been incorporated into a 
derivative epoD gene product. When incorporated into the epothilone PKS (or the PKS for 
an epothilone derivative), the resulting recombinant epothilone PKS produces epothilones 
C, A, and E, depending, again, on whether epothilone modification enzymes are present. 

30 As noted above, depending on the host, whether the fourth module includes a KR and DH 
domain, and on whether and which of the dehydratase, epoxidase, and oxidase activities 
are present, the practitioner of the invention can produce one or more of the epothilone G, 
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C, A, and E compounds and derivatives thereof using the compounds, host cells, and 
methods of the invention. 

The fifth module of the epothilone PKS includes a KS, an AT that binds maionyl 
CoA, a DH, an ER, a KR, and an ACP. This module is encoded by a sequence within an 
5 -1 2.4 kb Nsil-Notl restriction fragment of cosraid pKOS35-70. 1 A2. 

The recombinant DNA compounds of the invention that encode the fifth module of 
the epothilone PKS and the corresponding polypeptides encoded thereby are useful for a 
variety of applications. In one embodiment, a DNA compound comprising a sequence that 
encodes the epothilone fifth module is inserted into a DNA compound that comprises the 
1 0 coding sequence for one or more modules of a heterologous PKS. The resulting construct, 
in which the coding sequence for a module of the heterologous PKS is either replaced by 
that for the fifth module of the epothilone PKS or the latter is merely added to coding 
sequences for the modules of the heterologous PKS, can be incorporated into an 
expression vector and used to produce the recombinant protein encoded thereby. When the 
1 5 recombinant protein is combined with the other proteins of the heterologous PKS, a novel 
PKS is produced. In another embodiment, a DNA compound comprising a sequence that 
encodes the fifth module of the epothilone PKS is inserted into a DNA compound that 
comprises coding sequences for the epothilone PKS or a recombinant epothilone PKS that 
produces an epothilone derivative. In the latter constructs, the epothilone fifth module is 
20 typically expressed as a protein comprising the third, fourth, and sixth modules of the 
epothilone PKS or derivatives thereof. 

In another embodiment, a portion of the fifth module coding sequence is utilized in 
conjunction with other PKS coding sequences to create a hybrid module coding sequence 
and the hybrid module encoded thereby. In this embodiment, the invention provides, for 
25 example, either replacing the maionyl CoA specific AT with a methylmalonyl CoA, 
elhylmalonyl CoA, or 2*hydroxymalonyl CoA specific AT; deleting any one, two, or all 
three of the ER, DH, and KR; and/or replacing any one, two, or all three of the ER, DH, 
and KR with either a KR, a DH and KR, or a KR, DH, and ER, including, optionally, to 
specify a different stereochemistry. In addition, the KS and/or ACP can be replaced with 
30 another KS and/or ACP. In each of these replacements or insertions, the heterologous KS, 
AT, DH, KR, ER, or ACP coding sequence can originate from a coding sequence for 
another module of the epothilone PKS, from a coding sequence for a PKS that produces a 
polyketide other than epothilone, or from chemical synthesis. The resulting hybrid fifth 
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module coding sequence can be utilized in conjunction with a coding sequence for a PKS 
that synthesizes epothilone, an epothilone derivative, or another polyketide. Alternatively, 
the fifth module of the epothilone PKS can be deleted or replaced in its entirety by a 
module of a heterologous PKS to produce a protein that in combination with the other 

proteins of the epothilone PKS or derivatives thereof constitutes a PKS that produces an 
epothilone derivative. 

Illustrative recombinant PKS genes of the invention include recombinant epoD 
gene derivatives in which the AT domain encoding sequences for the fifth module of the 
epothilone PKS have been altered or replaced to change the AT domain encoded thereby 
from a malonyl specific AT to a methylmalonyl specific AT. Such methylmalonyl specific 
AT domain encoding nucleic acids can be isolated, for example and without limitation, 
from the PKS genes encoding DEBS, the narbonolide PKS, the rapamycin PKS, and the 
FK-520 PKS. When such recombinant epoD gene derivatives are coexpressed with the 
epoA, epoB, epoC, epoE, and epoF genes (or derivatives thereof), the PKS composed 
thereof produces the 1 0-methyl epothilones nr derivatives thereof. Another recombinant 
epoD gene derivative provided by the invention includes not only this altered module 5 
coding sequence but also module 4 coding sequences that encode an AT domain that binds 
only methylmalonyl CoA. Wien incuiporated into a PKS with the epoA, epoB, epoC, 
epoE, and epoF genes, the recombinant epoD gene derivative product leads to the 
production of 10-methyl epothilone B and/or D derivatives. 

Other illustrative recombinant epoD gene derivatives of the invention include those 
m which the ER, DH, and KR domain encoding sequences for the fifth module of the 
epothilone PKS have been replaced with those encoding (i) a KR and DH domain; (ii) a 
KR domain; and (iii) an inactive KR domain. These recombinant epoD gene derivatives of 
the invention are coexpressed with the epoA, epoB, epoC, epoE, and epoF genes to 
produce a recombinant PKS that makes the corresponding (i) C-l 1 alkene, (ii) C-l 1 
hydroxy, and (iii) C-l 1 keto epothilone derivatives. These recombinant epoD gene 
derivatives can also be coexpressed with recombinant epo genes containing other 
alterations or can themselves be further altered to produce a PKS that makes the 
corresponding C-l 1 epothilone derivatives. For example, one recombinant epoD gene 
derivative provided by the invention also includes module 4 coding sequences that encode 
an AT domain that binds only methylmalonyl CoA. When incoiporated into a PKS with 
the epoA, epoB, epoC, epoE, and epoF genes, the recombinant epoD gene derivative 
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product leads to the production of the corresponding C-l 1 epothilone B and/or D 
derivatives. 

Functionally similar epoD genes for producing the epothilone C-l 1 derivatives can 
also be made by inactivation of one, two, or all three of the ER, DH, and KR domains of 
the epothilone fifth module. However, the preferred mode for altering such domains in any 
module is by replacement with the complete set of desired do main.; taken from another 
module of the same or a heterologous PKS coding sequence. In this manner, the natural 
architecture of the PKS is conserved. Also, when present, KR and DH or KR, DH, and ER 
domains that function together in a native PKS are preferably used in the recombinant 
PKS. Illustrative replacement domains for the substitutions described above include, for 
example and without limitation, the inactive KR domain from the rapamycin PKS module 
3 to form the ketone, the KR domain from the rapamycin PKS module 5 to form the 
alcohol, and the KR and DH domains from the rapamycin PKS module 4 to form the 
alkene. Other such inactive KR, active KR, and active KR and DH domain encoding 
nucleic acids can be isolated from, for example and without limitation, the PKS genes 
encoding DEBS, the narbonolide PKS, and the FK-520 PKS. Each of the resulting PKS 
enzymes produces a polyketide compound that comprises a functional group at the C-l 1 
position that can be further derivatized in vitro by standard chemical methodology to yield 
semi-synthetic epothilone derivatives of the invention. 

The sixth module of the epothilone PKS includes a KS, an AT that binds 
methylmaionyl CoA, a DH, an ER, a KR, and an ACP. This module is encoded by a 

sequence within an -14.5 kb Hindlll-Nsil restriction fragment of cosmid pKOS35- 
70.1A2. 

The recombinant DNA compounds of the invention that encode the sixth module 
of the epothilone PKS and the corresponding polypeptides encoded thereby are useful for 
a variety of applications. In one embodiment, a DNA compound comprising a sequence 
that encodes the epothilone sixth module is inserted into a DNA compound that comprises 
the coding sequence for one or more modules of a heterologous PKS. The resulting protein 
encoded by the construct, in which the coding sequence for a module of the heterologous 
PKS is either replaced by that for the sixth module of the epothilone PKS or the latter is 
merely added to coding sequences for the modules of the heterologous PKS, provides a 
novel PKS when coexpressed with the other proteins comprising the PKS. In another 
embodiment, a DNA compound comprising a sequence that encodes the sixth module of 
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the epothilone PKS is inserted into a DNA compound that comprises the coding sequence 
for modules 3, 4, and 5 of the epothilone PKS or a recombinant epothilone PKS that 
produces an epothilone derivative and coexpressed with the other proteins of the 
epothilone or epothilone derivative PKS to produce a PKS that makes epothilone or an 
5 epothilone derivative in a host cell. 

In another embodiment, a portion of the sixth module coding sequence is utilized 
in conjunction with other PKS coding sequences to create a hybrid module. In this 
embodiment, the invention provides, for example, either replacing the methylmalonyi CoA 
specific AT with a malonyl CoA, ethylmalonyl CoA, or 2-hydroxymalonyl CoA specific 
10 AT; deleting any one, two, or all three of the ER, DH, and KR; and/or replacing any one, 
two, or all three of the ER, DH, and KR with either a KR, a DH and KR, or a KR, DH, and 
ER, including, optionally, to specify a different stereochemistry. In addition, the KS and/or 
A CP can be replaced with another KS and/or ACP. In each of these replacements or 
insertions, the heterologous KS, AT, DH, KR, ER, or ACP coding sequence can originate 
1 5 from a coding sequence for another module of the epothilone PKS, from a coding 

sequence for a PKS that produces a polyketide other than epothilone, or from chemical 
synthesis. The resulting heterologous sixth module coding sequence can be utilized in 
conjunction with a coding sequence for a protein subunit of a PKS that makes epothilone, 
an epothilone derivative, or another polyketide. If the PKS makes epothilone or an 
20 epothilone derivative, the hybrid sixth module is typically expressed as a protein 
comprising modules 3, 4, and 5 of the epothilone PKS or derivatives thereof. 

Alternatively, the sixth module of the epothilone PKS can be deleted or replaced in its 
entirety by a module from a heterologous PKS to produce a PKS for an epothilone 
derivative. 

25 Illustrative recombinant PKS genes of the invention include those in which the AT 

domain encoding sequences for the sixth module of the epothilone PKS have been altered 
or replaced to change the AT domain encoded thereby from a methylmalonyi specific AT 
to a malonyl specific AT. Such malonyl specific AT domain encoding nucleic acids can be 
isolated from, for example and without limitation, the PKS genes encoding the 
30 narbonolide PKS, the rapamycin PKS, and the FK-520 PKS. When a recombinant epoD 
gene of the invention encoding such a hybrid module 6 is coexpressed with the other 
epothilone PKS genes, the recombinant PKS makes the 8-des methyl epothilone 
derivatives. This recombinant epoD gene derivative can also be coexpressed with 
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recombinant epo gene derivatives containing other alterations or can itself be further 
altered to produce a PKS that makes the corresponding 8-desmcthyl epothilone 
derivatives. For example, one recombinant epoD gene provided by the invention also 
includes module 4 coding sequences that encode an AT domain that binds only 
3 methylmalonyl CoA. When incorporated into a PKS with the epo A, epoR s epoC , epoE, 
and epoF genes, the recombinant epoD gene product leads to the production of the 8- 
desmethyl derivatives of epothilones B and D. 

Other illustrative recombinant epoD gene derivatives of the invention include those 
in which the ER, DH, and KR domain encoding sequences for the sixth module of the 
1 0 epothilone PKS have been replaced with those that encode (i) a KR and DH domain; (u) a 
KR domain; and (iii) an inactive KR domain. These recombinant epoD gene derivatives of 
the invention, when coexpressed with the other epothilone PKS genes make the 
corresponding (i) C-9 alkene, (ii) C-9 hydroxy, and (iii) C-9 keto epothilone derivatives. 
These recombinant epoD gene derivatives can also be coexpressed with other recombinant 
1 5 epo gene derivatives containing other alterations or can themselves be further altered to 
produce a PKS that makes the corresponding C-9 epothilone derivatives. For example, one 
recombinant epoD gene derivative provided by the invention also includes module 4 
coding sequences that encode an AT domain that binds only methylmalonyl CoA. When 
incorporated into a PKS with the epo A, epoB , , epoC, epoE, and epoF genes, the 

20 recombinant epoD gene product leads to the production of the C-9 derivatives of 
epothilones B and D. 

Functionally equivalent sixth modules can also be made by inactivation of one, 
two, or all three of the ER, DH, and KR domains of the epothilone sixth module. The 
preferred mode for altering such domains in any module is by replacement with the 
25 complete set of desired domains taken from another module of the same or a heterologous 
PKS coding sequence. Illustrative replacement domains for the substitutions described 
above include but are not limited to the inactive KR domain from the rapamycin PKS 
module 3 to form the ketone, the KR domain from the rapamycin PKS module 5 to form 
the alcohol, and the KR and DH domains from the rapamycin PKS module 4 to form the 
30 alkene. Other such inactive KR, active KR, and active KR and DH domain encoding 

nucleic acids can be isolated from for example and without limitation the PKS genes 
encoding DEBS, the narbonolide PKS, and the FK-520 PKS. Each of the resulting PKSs 
produces a polyketide compound that comprises a functional group at the C-9 position that 
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can be further derivatized in vitro by standard chemical methodology to yield semi- 
synthetic epothilone derivatives of the invention. 

The seventh module of the epothilone PKS includes a KS, an AT specific for 
methylmalonyl CoA, a KR, and an ACP. This module is encoded by a sequence within an 
-8.7 kb Bgin restriction fragment from cosmid pKOS35-70.4. 

The recombinant DNA compounds of the invention that encode the seventh 
module of the epothilone PKS and the corresponding polypeptides encoded thereby are 
useful for a variety of applications. The seventh module of the epothilone PKS is 
contained in the gene product of the epoE gene, which also contains the eighth module. 
The present invention provides the epoE gene in recombinant form, but also provides 
DNA compounds that encode the seventh module without coding sequences for the eighth 
module as well as DNA compounds that encode the eighth module without coding 
sequences for the seventh module. In one embodiment, a DNA compound comprising a 
sequence that encodes the epothilone seventh module is inserted into a DNA compound 
that comprises the coding sequence for one or more modules of a heterologous PKS. The 
resulting construct, in which the coding sequence for a module of the heterologous PKS is 
either replaced by that for the seventh module of the epothilone PKS or the latter is merely 
added to coding sequences for the modules of the heterologous PKS, provides a novel 
PKS coding sequence that can be expressed in a host cell. Alternatively, the epothilone 
seventh module can be expressed as a discrete protein. In another embodiment, a DNA 
compound comprising a sequence that encodes the seventh module of the epothilone PKS 
is expressed to form a protein that, together with other proteins, constitutes the epothilone 
PKS or a PKS that produces an epothilone derivative. In these embodiments, the seventh 
module is typically expressed as a protein comprising the eighth module of the epothilone 
PKS or a derivative thereof and coexpressed with the epoA, epoB, cpoC, epoD, and epoF 
genes or derivatives thereof to constitute the PKS. 

In another embodiment, a portion or all of the seventh module coding sequence is 
utilized in conjunction with other PKS coding sequences to create a hybrid module. In this 
embodiment, the invention provides, for example, cither replacing the methylmalonyl CoA 
specific AT with a malony 1 CoA, ethylmalony] CoA, or 2-hydroxymalonyl CoA specific 
AT; deleting the KR; replacing the KR with a KR that specifies a different 
stereochemistry; and/or inserting a DH or a DH and an ER. In addition, the KS and/or 
ACP can be replaced with another KS and/or ACP. In each of these replacements or 
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insertions, the heterologous KS, AT, DH, KR, ER, or ACP coding sequence can originate 
from a coding sequence for another module of the epothilone PKS, from a coding 
sequence for a PKS that produces a polyketide other than epothilone, or from chemical 
synthesis. The resulting heterologous seventh module coding sequence is utilized, 
optionally in conjunction with other coding sequences, to express a protein that together 
with other proteins constitutes a PKS that synthesizes epothilone, an epothilone derivative, 
or another polyketide. When used to prepare epothilone or an epothilone derivative, the 
seventh module is typically expressed as a protein comprising the eighth module or 
derivative thereof and coexpressed with the cpoA , cpoB, cpoC , epoD t and epoF genes or 
derivatives thereof to constitute the PKS. Alternatively, the coding sequences for the 
seventh module in the epoE gene can be deleted or replaced by those for a heterologous 
module to prepare a recombinant epoE gene derivative that, together with the epoA y epoB, 
epoC, epoD , and epoF genes, can be expressed to make a PKS for an epothilone 
derivative. 

Illustrative recombinant epoE gene derivatives of the invention include those in 
which the AT domain encoding sequences for the seventh module of the epothilone PKS 
have been altered or replaced to change the AT domain encoded thereby from a 
methylmalonyl specific AT to a malonyl specific AT. Such malonvl specific AT domain 
encoding nucleic acids can be isolated from for example and without limitation the PKS 
genes encoding the narbonolide PKS, the rapamycin PKS, and the FK-520 PKS. When 
coexpressed with the other epothilone PKS genes, epoA , epuB, epoC, epoD, and epoF, or 
derivatives thereof, a PKS for an epothilone derivative with a C-6 hydrogen, instead of a 
C-6 methyl, is produced. Thus, if the genes contain no other alterations, the compounds 
produced are the 6-desmethyl epothilones. 

The eighth module of the epothilone PKS includes a KS, an AT specific for 
methylmalonyl CoA, inactive KR and DH domains, a methyltransferase (MT) domain, 
and an ACP. This module is encoded by a sequence within an -10 kb Noll restriction 
fragment of cosmid pKOS35-79.85. 

The recombinant DNA compounds of the invention that encode the eighth module 
of the epothilone PKS and the corresponding polypeptides encoded thereby are useful for 
a variety of applications. In one embodiment, a DNA compound comprising a sequence 
that encodes the epothilone eighth module is inserted into a DNA compound that 
comprises the coding sequence for one or more modules of a heterologous PKS. The 
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resulting construct, in which the coding sequence for a module of the heterologous PKS is 
either replaced by that for the eighth module of the epothilone PKS or the latter is merely 
added to coding sequences for modules of the heterologous PKS, provides a novel PKS 
coding sequence that is expressed with the other proteins constituting the PKS to provide a 
novel PKS. Alternatively, the eighth module can be expressed as a discrete protein that 
can associate with other PKS proteins to constitute a novel PKS. In another embodiment, a 
DNA compound comprising a sequence that encodes the eighth module of the epothilone 
PKS is coexpressed with the other proteins constituting the epothilone PKS or a PKS that 
produces an epothilone derivative. In these embodiments, the eighth module is typically 
expressed as a protein that also comprises the seventh module or a derivative thereof. 

In another embodiment, a portion or all of the eighth module coding sequence is 
utilized in conjunction with other PKS coding sequences to create a hybrid module. In this 
embodiment, the invention provides, for example, either replacing the methylmalonyl CoA 
specific AT with a malonyl CoA, ethylmalonyl CoA, or 2-hydroxymalony] CoA specific 
AT; deleting the inactive KR and/or the inactive DH; replacing the inactive KR and/or DH 
with an active KR and/or DH; and/or inserting an ER. In addition, the KS and/or ACP can 
be replaced with another KS and/or ACP . In each of these replacements or insertions, the 
heterologous KS, AT, DH, KR, ER, or ACP coding sequence can originate from a coding 
sequence for another module of the epothilone PKS, from a coding sequence for a PKS 
that produces a polyketide other than epothilone, or from chemical synthesis. The resulting 
heterologous eighth module coding sequence is expressed as a protein that is utilized in 
conjunction with the other proteins that constitute a PKS that synthesizes epothilone, an 
epothilone derivative, or another polyketide. When used to prepare epothilone or an 
epothilone derivative, the heterologous or hybrid eighth module is typically expressed as a 
recombinant epoE gene product that also contains the seventh module. Alternatively, the 
coding sequences for the eighth module in the epoE gene can be deleted or replaced by 
those for a heterologous module to prepare a recombinant epoE gene that, together with 

the epoA, epoB, epoC, epoD, and epoF genes, can be expressed to make a PKS for an 
epothilone derivative. 

The eighth module of the epothilone PKS also comprises a methylation of 
methyltransferase (MT) domain with an activity that methylates the epothilone precursor. 
This function can be deleted to produce a recombinant epoD gene derivative of the 
invention, which can be expressed with the other epothilone PKS genes or derivatives 
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thereof that makes an epothilone derivative that lacks one or both methyl groups, 
depending on whether the AT domain of the eighth module has been changed to a malonyl 
specific AT domain, at the corresponding C-4 position of the epothilone molecule. In 
another important embodiment, the present invention provides recombinant DNA 
compounds that encode a polypeptide with this methylation domain and activity and a 
variety of recombinant PKS coding sequences that encode recombinant PKS enzymes that 
incorporate this polypeptide. The availability of this MT domain and the coding sequences 
therefor provides a significant number of new polyketides that differ from known 
polyketides by the presence of at least an additional methyl group. The MT domain of the 
invention can in effect be added to any PKS module to direct the methylation at the 
corresponding location in the polyketide produced by the PKS. As but one illustrative 
example, the present invention provides the recombinant nucleic acid compounds resulting 
from inserting the coding sequence for this MT activity into a coding sequence for any one 
or more of the six modules of the DEBS enzyme to produce a recombinant DEBS that 
synthesizes a 6-deoxyerythronolide B derivative that comprises one or more additional 
methyl groups at the C-2, C-4, C-6, C-8, C-10, and/or C-12 positions. In such constructs, 
the MT domain can be inserted adjacent to the AT or the ACP. 

The ninth module of the epothilone PKS includes a KS, an AT specific for malonyl 
CoA, a KR, an inactive DH, and an ACP. This module is encoded by a sequence within an 
-14.7 Hindlll-Bglll kb restriction fragment of cosmid pKOS35-79.85. 

The recombinant DNA compounds of the invention that encode the ninth module 
of the epothilone PKS and the corresponding polypeptides encoded thereby are usefiil for 
a variety of applications. The ninth module of the epothilone PKS is expressed as a 
protein, the product of the epoF gene, that also contains the TE domain of the epothilone 
PKS. The present invention provides the epoF gene in recombinant form, as well as DNA 
compounds that encode the ninth module without the coding sequences for the TE domain 
and DNA compounds that encode the TE domain without the coding sequences for the 
ninth module. In one embodiment, a DNA compound comprising a sequence that encodes 
the epothilone ninth module is inserted into a DNA compound that comprises the coding 
sequence for one or more modules of a heterologous PKS. The resulting construct, in 
which the coding sequence for a module of the heterologous PKS is either replaced by that 
for the ninth module of the epothilone PKS or the latter is merely added to coding 
sequences for the modules of the heterologous PKS, provides a novel PKS protein coding 
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sequence that when coexpressed with the other proteins constituting a PKS provides a 
novel PKS. The ninth module coding sequence can also be expressed as a discrete protein 
with or without an attached IE domain. In another embodiment, a DNA compound 
comprising a sequence that encodes the ninth module of the epothilone PKS is expressed 
5 as a protein together with other proteins to constitute an epothilone PKS or a PKS that 
produces an epothilone derivative. In these embodiments, the ninth module is typically 

expressed as a protein that also contains the TE domain of either the epothilone PKS or a 
heterologous PKS. 

In another embodiment, a portion or all of the ninth module coding sequence is 

1 0 utilized in conjunction with other PKS coding sequences to create a hybrid module. In this 

embodiment, the invention provides, for example, either replacing the malonyl CoA 

specific AT with a methylmalonyl CoA, ethylmalonyl CoA, or 2-hydroxy malonyl CoA 

specific AT; deleting the KR; replacing the KR with a KR that specifies a different 

stereochemistry; and/or inserting a DH or a DH and an ER. In addition, the KS and/or 

1 5 ACP can be replaced with another KS and/or ACP. In each of these replacements or 

insertions, the heterologous KS, AT. DH, KR, ER, or ACP coding sequence can originate 

from a coding sequence for another module of the epothilone PKS, from a coding 

sequence for a PKS that produces a polyketide other than epothilone, or from chemical 

synthesis. The resulting heterologous ninth module coding sequence is coexpressed with 

20 the other proteins constituting a PKS that synthesizes epothilone, an epothilone derivative, 

or another polyketide. Alternatively, the present invention provides a PKS for an 

epothilone or epothilone derivative in which the ninth module has been replaced by a 

module from a heterologous PKS or has been deleted in its entirety. In the latter 

embodiment, the TE domain is expressed as a discrete protein or fused to the eighth 
25 module. 

The ninth module of the epothilone PKS is followed by a thioesterase domain. This 
domain is encoded in the -14.7 kb Hindlll-Bglll restriction comprising the ninth module 
coding sequence. The present invention provides recombinant DNA compounds that 
encode hybrid PKS enzymes in which the ninth module of the epothilone PKS is fused to 
30 a heterologous thioesterase or one or more modules of a heterologous PKS are fused to the 
epothilone PKS thioesterase. Thus, for example, a thioesterase domain coding sequence 
from another PKS can be inserted at the end of the ninth module ACP coding sequence in 
recombinant DNA compounds of the invention. Recombinant DNA compounds encoding 
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this thioesterase domain are therefore useful in constructing DNA compounds that encode 
a protein of the epothilone PKS, a PKS that produces an epothilone derivative, and a PKS 
that produces a polyketide other than epothilone or an epothilone derivative. 

In one important embodiment, the present invention thus provides a hybrid PKS 
5 and the corresponding recombinant DNA compounds that encode the proteins constituting 
those hybrid PKS enzymes. For purposes of the present invention a hybrid PKS is a 
recombinant PKS that comprises all or pan of one or more modules, loading domain, and 
thioesterase/cyclase domain of a first PKS and all or part of one or more modules, loading 
domain, and thioesterase/cyclase domain of a second PKS. In one preferred embodiment, 
10 the first PKS is most but not all of the epothilone PKS, and the second PKS is only a 
portion or all of a non-epothilone PKS. An illustrative example of such a hybrid PKS 
includes an epothilone PKS in which the natural loading domain has been replaced with a 
loading domain of another PKS. Another example of such a hybrid PKS is an epothilone 
PKS in which the AT domain of module four is replaced with an AT domain from a 
1 5 heterologous PKS that binds only methvlmalonyl CoA. In another preferred embodiment, 
the first PKS is most but not all of a non-epothilone PKS, and the second PKS is only a 
portion or all of the epothilone PKS. An illustrative example of such a hybrid PKS 
includes an erythromycin PKS in which an AT specific for methylmalonyl CoA is 
replaced with an AT from the epothilone PKS specific for malonyl CoA. Another example 
20 is an erythromycin PKS that includes the MT domain of the epothilone PKS. 

Those of skill in the art will recognize that all or part of either the first or second 
PKS in a hybrid PKS of the invention need not be isolated from a naturally occurring 
source. For example, only a small portion of an AT domain determines its specificity. See 
U.S. patent application Serial No. 09/346,860 and PCT patent application No. WO 
25 US99/1 5047, each of which is incorporated herein by reference. The state of the art in 

DNA synthesis allows the artisan to construct de novo DNA compounds of size sufficient 
to construct a useful portion of a PKS module or domain. For purposes of the present 
invention, such synthetic DNA compounds are deemed to be a portion of a PKS. 

The following Table lists references describing illustrative PKS genes and 
30 corresponding enzymes that can be utilized in the construction of the recombinant PKSs 
and the corresponding DNA compounds that encode them of the invention. Also presented 
are various references describing polyketide tailoring and modification enzymes and 
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corresponding genes that can be employed to make the recombinant DNA compounds of 
the present invention. 

Avermectin 

5 U.S. Pat. No. 5,252,474 to Merck. 

MacNeil etal., 1993, Industrial Microorganisms: Basic and Applied Molecular 
Generics, Baltz, Hegeman, & Skatrud, eds. (ASM), pp. 245-256, A Comparison of the 

Genes, Encoding the Polykeride Synthases for Avermectin, Erythromycin, and 
Nemadectin. 

1 0 MacNeil et al, 1 992, Gene 115:119-125, Complex Organization of the 

Streptomyces avermitilis genes encoding the avennectin polyketide synthase. 

Ikeda and Omura, 1 997, Chem. Res. 97: 2599-2609, Avennectin biosynthesis. 
Candicidin (FR008) 

Hu etal., 1994, Mol. Microbiol. 14: 163-172. 

15 Erythromycin 

PCT Pub. No. 93/13663 to Abbott. 

US Pat No. 5,824,513 to Abbott. 

Donadio et al , 1991, Science 252:675-9. 

Cortes et al 8 Nov. 1990, Nature 348: 176-8, An unusually large multifunctional 

20 polypeptide in the erythromycin producing polyketide synthase of Saccharopolyspora 
erythraea . 

Glycosvlation Enzymes 

PCT Pat. App. Pub. No. 97/23630 to Abbott. 

FK-506 

^ , Motamedi et al., 1 998, The biosynthetic gene cluster for the macrolactone ring of 

the immunosuppressant FK-506, Eur. J. Biochem. 256: 528-534. 

Motamedi et al„ 1997, Stmctural organization of a multifunctional polyketide 

synthase involved in the biosynthesis of the macrolide immunosuppressant FK-506, Eur. J. 
Biochem. 244: 74-80. 

30 Methyltransferase 

US 5,264,355, issued 23 Nov. 1993, Methylating enzyme from Streptomyces 

MA6858. 31 -O-desmethyl-FK-506 methyltransferase. 
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Motamedi et al, 1996, Characterization of methyltransferase and hydroxylase 
genes involved in the biosynthesis of the immunosuppressants FK-506 and FK-520, J. 
Bacterid. 178: 5243-5248. 

FK-520 

5 U.S. patent application Serial No. 09/154,083, filed 16 Sep. 1998. 

U.S. patent application Serial No. 09/410,551, filed 1 Oct. 1999. 

Nielsen etal y 1991, Biochem. 30:5789-96. 

Lovastatin 

U.S. Pat. No. 5,744,350 to Merck. 

10 Narhomycin 

U.S. patent application Serial No. 60/107,093, filed 5 Nov. 1998. 

Ncmadectin 


MacNeit et al ., 1 993, supra. 

Niddamvcin 

1 5 Kakavas et ul f 1 997, Identification and characterization of the niddamycin 

polyketide synthase genes from Streptomyces caclestis, J. Bacteriol. 179: 75 1 5-7522. 
Oleandomycin 

Swan et ciL. 1 994, Characterisation of a Streptomyces aatibioticus gene encoding a 

type I polyketide synthase which has an unusual coding sequence. Mol. Gen. Genet. 242: 
20 358-362. 

U.S. patent application Serial No. 60/1 20,254, filed 1 6 Feb. 1 999, Serial No. 

09/ * flIed 28 Oct, 1999, claiming priority thereto by inventors S. Shah, M. Betlach. 

R. McDaniel, and L. Tang, attorney docket No. 30063-20029.00. 

Olano et al, 1998, Analysis of a Streptomyces antibioticus chromosomal region 
25 involved in oleandomycin biosynthesis, which encodes two glycosyltransferases 

responsible for glycosylation of the macrolactone ring, Mol. Gen. Genet 259(3): 299- 
308. 

Picromycin 

PCT patent application No. WO US99/1 1814, filed 28 May 1999. 

30 U.S. patent application Serial No. 09/320,878, filed 27 May 1999. 

U.S. patent application Serial No. 09/141,908, filed 28 Aug. 1998. 
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Xue et al, 1998, Hydroxylation of macrolactones YC-17 and narbomycin is 
mediated by the pikC-encoded cytochrome P450 in Streptomycin venezueiae, Chemistry 
& Biology 5(11): 661-667. 

Xue et a !. , Oct 1998, A gene cluster for macrolide antibiotic biosynthesis in 
5 Streptomyces venezueiae: Architecture of metabolic diversity, Proc. Natl. Acad. Sci. USA 

95: 12111 12116. 

Platcnolide 

EP Pat. App. Pub. No. 791,656 to Lilly. 

Pradimicin 

^ PCT Pat. Pub. No. WO 98/11230 to Bristol-Myers Squibb. 

Rapamycin 

Schwecke et al , Aug. 1995, The biosynthetic gene cluster for the polyketide 
rapamycin, Proc. Natl. Acad. Sci. USA 92:7839-7843. 

Aparicio et al 1996, Organization of the biosynthetic gene cluster for rapamycin 
1 5 in Streptomyces hygroscopicus : analysis of the enzymatic domains in the modular 
polyketide synthase. Gene 169: 9-16. 

Rifamycin 

PCT Pat. Pub. No. WO 98/07868 to Novartis. 

August et al , 13 Feb. 1998, Biosynthesis of the ansamycin antibiotic rifamycin: 

20 deductions from the molecular analysis of the biosynthetic gene cluster of 
Amycolatopsis mediterranei S669, Chemistry & Biology, 5(2): 69-79. 

Sorangium PKS 

U.S. patent application Serial No. 09/144,085, filed 31 Aug. 1998. 

Soraphen 

25 U.S. Pat. No. 5,716,849 to Novartis. 

Schupp et al, 1995, J. Bacteriology 177: 3673-3679. A Sorangium cellulosum 
(Myxobacterium) Gene Cluster for the Biosynthesis of the Macrolide Antibiotic Soraphen 
A: Cloning, Characterization, and Homology to Polyketide Synthase Genes from 
Actinomycetes. 

30 Spiramycin 

U.S. Pat. No. 5,098,837 to Lilly. 

Activator Gene 

U.S. Pat. No. 5,514,544 to Lilly. 
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Tylosin 

U.S. Pat. No. 5,876,991 to Lilly. 

EP Pub. No. 791,655 to Lilly. 

Kuhstoss et al„ 1 996, Gene 1 83 :23 1 -6., Production of a novel polyketide through 
the construction of a hybrid polyketide synthase. 

Tailoring enzymes 

Merson-Davies and CundlifTe, 1994, Mol. Microbiol. 13: 349-355. Analysis of five 
tylosin biosynthetic genes from the tylBA region of the Streptomyces Jradiae genome. 

As the above Table illustrates, there are a wide variety of PKS genes that serve as 
readily available sources of DNA and sequence information for use in construct ing the 
hybrid PKS-cncoding DNA compounds of the invention. Methods for constructing hybrid 
PKS-encoding DNA compounds are described without reference to the epothilone PKS in 
U.S. Patent Nos. 5,672,491 and 5,712.146 and U.S. patent application Serial Nos. 
09/073,538, filed 6 May 1998, and 09/141,908, filed 28 Aug 1998, each of which is 
incorporated herein by reference. Preferred PKS enzymes and coding sequences for the 
proteins which constitute them for purposes of isolating heterologous PKS domain coding 
sequences for constructing hybrid PKS enzymes of the invention are the soraphen PKS 
and the PKS described as a Sorangium PKS in the above table. 

To summarize the functions of the genes cloned and sequenced in Example 1: 


Gene 

Protein 

Modules 

Domains Present 

epoA 

EpoA 

Load 

Ks> mAT ER ACP 

epoB 

EpoB 

1 

NRPS, condensation, heterocyclization, 




adenylation, thiolation, PCP 

epoC 

EpoC 

2 

KS mmAT DH KR ACP 

epoD 

EpoD 

3 

KS mAT KR ACP 



4 

KS mAT KR ACP 



5 

KS mAT DH ER KR ACP 



6 

KS mmAT DH ER KR ACP 

epoE 

EpoE 

7 

KS mmAT KR ACP 



8 

KS mmAT MT DH* KR* ACP 

epoF 

EpoF 

9 

KS mAT KR DH* ACP TE 
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NRPS - non-ribosomal peptide synthetase; KS - ketosynthase; mAT - malonyl CoA 
specifying acyltransferase; mmAT-methylmalonyl CoA specifying acyltransferase; DH - 
dehydratase; ER - enoylreductase; KR - ketoreductase; MT - methyltransferase; TE 
thioesterase; * — inactive domain. 

5 The hybrid PKS-encoding DNA compounds of the invention con be and often are 

hybrids of more than two PKS genes. Even where only two genes are used, there are often 
two or more modules in the hybrid gene in which all or part of the module is derived from 
a second (or third) PKS gene. Illustrative examples of recombinant epothilone derivative 
PKS genes of the invention, which are identified by listing the specificities of the hybrid 

1 0 modules (the other modules having the same specificity as the epothilone PKS), include; 

(a) module 4 with methylmalonyl specific AT (mm AT) and a KR and module 2 
with a malonyl specific AT (m AT) and a KR; 

(b) module 4 with mM AT and a KR and module 3 with mM AT and a KR; 

(c) module 4 with mM AT and a KR and module 5 with mM AT and a ER. DH, 

15 and KR; 

(d) module 4 with mM AT and a KR and module 5 with mM AT and a DH and 
KR; 

(e) module 4 with mM AT and a KR and module 5 with mM AT and a KR; 

(f) module 4 with mM AT and a KR and module 5 with mM AT and an inactive 

20 KR; 

(g) module 4 with mM AT and a KR and module 6 with m AT and a ER, DH, and 

KR; 

(h) module 4 with mM AT and a KR and module 6 with m AT and a DH and KR; 

(i) module 4 with mM AT and a KR and module 6 with m AT and a KR; 

25 (j) module 4 with mM AT and a KR and module 6 with m AT and an inactive KR; 

(k) module 4 with mM AT and a KR and module 7 with m AT; 

0) hybrids (c) through (f), except that module 5 has a m AT; 

(m) hybrids (g) through 0 except that module 6 has a mM AT; and 

(n) hybrids (a) through (m) except that module 4 has a m AT. 

30 The above list is illustrative only and should not be construed as limiting the invention, 
which includes other recombinant epothilone PKS genes and enzymes with not only two 
hybnd modules other than those shown but also with three or more hybrid modules. 


55 


WO 00/31247 


-48- 


PCT/US99/27438 


5 


10 


15 


20 


25 


30 


35 


40 


45 


50 


Those of skill in the art will appreciate that a hybrid PKS of the invention includes 
but is not limited to a PKS of any of the following types: (i) an epothilone or epothilone 
derivative PKS that contains a module in which at least one of the domains is from a 
heterologous module; (ii) an epothilone or epothilone derivative PKS that contains a 
5 module from a heterologous PKS; (iii) an epothilone or epothilone derivative PKS that 
contains a protein from a heterologous PKS; and (iv) combinations of the foregoing. 

While an important embodiment of the present invention relates to hybrid PKS 
. genes, the present invention also provides recombinant epothilone PKS genes in which 
there is no second PKS gene sequence present but which differ from the epothilone PKS 
1 0 gene by one or more deletions. The deletions can encompass one or more modules and/or 
can be limited to a partial deletion within one or more modules. When a deletion 
encompasses an entire module other than the NRPS module, the resulting epothilone 
derivative is at least two carbons shorter than the compound produced from the PKS from 
which the deleted version was derived. The deletion can also encompass the NRPS 
15 module and/or the loading domain, as noted above. When a deletion is within a module, 
the deletion typically encompasses a KR, DH, or ER domain, or both DH and ER 
domains, or both KR and DH domains, or all three KR, DH, and ER domains. 

The catalytic properties of the domains and modules of the epothilone PKS and of 
epothilone modification enzymes can also be altered by random or site specific 
20 mutagenesis of the corresponding genes. A wide variety of mutagenizing agents and 
methods are known in the art and are suitable for this purpose. The technique known as 
DNA shuffling can also be employed. See, e.g., U.S. Patent Nos. 5,830,721; 5,81 1,238; 

and 5,605,793; and references cited therein, each of which is incorporated herein by 
reference. 

25 

Recombinant Manipulations 

To construct a hybrid PKS or epothilone derivative PKS gene of the invention, or 
simply to express unmodified epothilone biosynthetic genes, one can employ a technique, 
described in PCT Pub. No. 98/27203 and U.S. patent application Serial Nos. 08/989,332, 

30 filed 1 1 Dec. 1 997, and 60/129,73 1, filed 1 6 April 1999, each of which is incorporated 
herein by reference, in which the various genes of the PKS are divided into two or more, 
often three, segments, and each segment is placed on a separate expression vector. In this 
manner, the full complement of genes can be assembled and manipulated more readily for 
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heterologous expression, and each of the segments of the gene can be altered, and various 

altered segments can be combined in a single host cell to provide a recombinant PKS of 

the invention. This technique makes more efficient the construction of large libraries of 

recombinant PKS genes, vectors for expressing those genes, and host cells comprising 

5 those vectors. In this and other contexts, the genes encoding the desired PKS are not only 

present on two or more vectors, but also can be ordered or arranged differently than in the 

native producer organism from which the genes were derived. Various examples of this 

technique as applied to the epothilone PKS are described in the Examples below. In one 

embodiment, the epoA, epoB, epoC, and epoD genes are present on a first plasmid, and the 

1 0 epoE and epoF and optionally either the epoK or the epoK and epoL genes are present on a 
second (or third) plasmid. 

Thus, in one important embodiment, the recombinant nucleic acid compounds of 
the invention are expression vectors. As used herein, the term “expression vector" refers to 
any nucleic acid that can be introduced into a host cell or cell-free transcription and 
1 5 translation medium. An expression vector can be maintained stably or transiently in a cell, 
whether as part of the chromosomal or other DNA in the cell or in any cellular 
compartment, such as a replicating vector in the cytoplasm. An expression vector also 
comprises a gene that serves to produce RNA that is translated into a polypeptide in the 
cell or cell extract. Thus, the vector typically includes a promoter to enhance gene 
20 expression but alternatively may serve to incorporate the relevant coding sequence under 
the control of an endogenous promoter. Furthermore, expression vectors may typically 
contain additional functional elements, such as resistance-conferring genes to act as 
selectable markers and regulatory genes to enhance promoter activity. 

The various components of an expression vector can vary widely, depending on the 
25 intended use of the vector. In particular, the components depend on the host cell(s) in 

which the vector will be used or is intended to function. Vector components for expression 
and maintenance of vectors in E coli are widely known and commercially available, as are 

vector components for other commonly used organisms, such as yeast cells and 
Sireptomyces cells. 

30 In one embodiment, the vectors of the invention are used to transform Sorangium 

host cells to provide the recombinant Sorangium host cells of the invention. U.S. Pat. No. 
5,686,295, incorporated herein by reference, describes a method for transforming 
Sorangium host cells, although other methods may also be employed. Sorangium is a 
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convenient host for expressing epothilone derivatives of the invention in which the 
recombinant PICS that produces such derivatives is expressed from a recombinant vector 
in which the epothilone PKS gene promoter is positioned to drive expression of the 
recombinant coding sequence. The epothilone PKS gene promoter is provided in 
5 recombinant form by the present invention and is an important embodiment thereof. The 
promoter is contained within an —500 nucleotide sequence between the end of the 
transposon sequences and the start site of the open reading frame of the epoA gene. 
Optionally, one can include sequences from further upstream of this 500 bp region in the 
promoter. Those of skill in the art will recognize that, if a Sorangium host that produces 
10 epothilone is used as the host cell, the recombinant vector need drive expression of only a 
portion of the PKS containing the altered sequences. Thus, such a vector may comprise 
only a single altered epothilone PKS gene, with the remainder of the epothilone PKS 
polypeptides provided by the genes in the host cell chromosomal DNA. If the host cell 
naturally produces an epothilone, the epothilone derivative will thus be produced in a 
1 5 mixture containing the naturally occurring epothilone(s). 

Those of skill will also recognize that the recombinant DNA compounds of the 
invention can be used to construct Sorangium host cells in which one or more genes 
involved in epothilone biosynthesis have been rendered inactive. Thus, the invention 
provides such Sorangium host cells, which may be preferred host cells for expressing 
20 epothilone derivatives of the invention so that complex mixtures of epothilones are 

avoided. Particularly preferred host cells of this type include those in which one or more 
of any of the epothilone PKS gene ORFs has been disrupted, and/or those in which any or 
more of the epothilone modification enzyme genes have been disrupted. Such host cells 
are typically constructed by a process involving homologous recombination using a vector 
25 that contains DNA homologous to the regions flanking the gene segment to be altered and 

positioned so that the desired homologous double crossover recombination event desired 
will occur. 

Homologous recombination can thus be used to delete, disrupt, or alter a gene. In a 
preferred illustrative embodiment, the present invention provides a recombinant 
30 epothilone producing Sorangium cellulosum host cell in which the epoK gene has been 
deleted or disrupted by homologous recombination using a recombinant DNA vector of 
the invention. This host cell, unable to make the epoK epoxidase gene product is unable to 
make epothilones A and B and so is a preferred source of epothilones C and D. 
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Homologous recombination can also be used to alter the specificity of a PKS 

• 

module by replacing coding sequences for the module or domain of a module to be aliered 
with those specifying a module or domain of the desired specificity. In another preferred 
illustrative embodiment, the present invention provides a recombinant epothilone 
producing Sorangium cellulosum host cell in which the coding sequence for the AT 
domain of module 4 encoded by the epoD gene has been altered by homologous 
recombination using a recombinant DNA vector of the invention to encode an AT domain 
that binds only methylmalonyi CoA. This host cell, unable to make epothilones A. C, and 
E is a preferred source of epothilones B, D, and F. The invention also provides 
recombinant Sorangium host cells in which both alterations and deletions of epothilone 
biosynthetic genes have been made. For example, the invention provides recombinant 
Sorangium cellulosum host cells in which both of the foregoing alteration ^nd deletion 
have been made, producing a host cell that makes only epothilone D. 

In similar fashion, those of skill in the art will appreciate the present invention 
provides a wide variety of recombinant Sorangium cellulosum host cells that make less 
complex mixtures of the epothilones than do the wild type producing cells as well as those 
that make one or more epothilone derivatives. Such host cells include those that make only 
epothilones A, C, and E; those that make only epothilones B, D, and F, those that make 
only epothilone D; and those that make only epothilone C. 

In another preferred embodiment, the present invention provides expression 
vectors and recombinant Myxococcus, preferably M. xanthus, host cells containing those 
expression vectors that express a recombinant epothilone PKS or a PKS for an epothilone 
derivative. Presently, vectors that replicate extrachromosomally in M. xanthus are not 
known. There are, however, a number of phage known to integrate into M. xanthus 
chromosomal DNA, including MxS, Mx9, Mx81, and Mx82. The integration and 
attachment function of these phages can be placed on plasmids to create phage-based 
expression vectors that integrate into the M. xanthus chromosomal DNA. Of these, phage 
Mx9 and Mx8 are preferred for purposes of the present invention. Plasmid pPLH343, 
described in Salmi et al, Feb. 1998, Genetic determinants of immunity and integration of 
temperate Myxococcus xanthus phage MxS, J. Bact. 1 80(3): 614-62 1 , is a plasmid that 

replicates in £ coli and comprises the phage Mx8 genes that encode the attachment and 
integration functions. 
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The promoter of the epothilone PKS gene functions in Myxococcus xanthus host 
cells. Thus, in one embodiment, the present invention provides a recombinant promoter for 
use in recombinant host cells derived from the promoter of the Sorangium celluiosum 
epothilone PKS gene. The promoter can be used to drive expression of one or more 
epothilone PKS genes or another useful gene product in recombinant host cells. The 
invention also provides an epothilone PKS expression vector in which one or more of the 
epothilone PKS or epothilone modification enzyme genes are under the control of their 
own promoter. Another preferred promoter for use in Myxococcus xanthus host cells for 
purposes of expressing a recombinant PKS of the invention is the promoter of the pilA 
gene of M. xanthus. This promoter, as well as two M. xanthus strains that express high 
levels of gene products from genes controlled by the pilA promoter, a pilA deletion strain 
/ and a pilS deletion strain, are described in Wu and Kaiser, Dec. 1997, Regulation of 
expression of the pilA gene in Myxococcus xanthus t J. Bact. 1 79(24):7748-7758, 
incorporated herein by reference. Optionally, the invention provides recombinant 
Myxococcus host cells comprising both the pilA and pilS deletions. Another preferred 
promoter is the starvation dependent promoter of the sdcK gene. 

Selectable markers for use in Myxococcus xanthus include kanamycin, tetracycline, 
chloramphenicol, zeocin, spectinomycin, and streptomycin resistance conferring genes. 

The recombinant DNA expression vectors of the invention for use in Myxococcus 
typically include such a selectable marker and may further comprise the promoter derived 
from an epothilone PKS or epothilone modification enzyme gene. 

The present invention provides preferred expression vectors for use in preparing 
the recombinant Myxococcus xanthus expression vectors and host cells of the invention. 
These vectors, designated plasmids pKOS35-82. 1 and pKOS35-82.2 (Figure 3), are able to 
replicate m E. coli host cells as well as integrate into the chromosomal DNA of 
M. xanthus. The vectors comprise the Mx8 attachment and integration genes as well as the 
pilA promoter with restriction enzyme recognition sites placed conveniently downstream. 
The two vectors differ from one another merely in the orientation of the pilA promoter on 
the vector and can be readily modified to include the epothilone PKS and modification 
enzyme genes of the invention. The construction of the vectors is described in Example 2. 

Especially preferred Myxococcus host cells of the invention are those that produce 
an epothilone or epothilone derivative or mixtures of epothilones or epothilone derivatives 
at equal to or greater than 20 mg/L, more preferably at equal to or greater than 200 mg/L, 
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and most preferably at equal to or greater than 1 g/L. Especially preferred are M. xanihus 
host cells that produce at these levels. M xanthus host cells that can be employed for 
purposes of the invention include the DZ 1 (Campos et aL , 1 978, J. Mol. Biol. 119:1 67- 
178, incorporated herein by reference), the TA-producing cell line ATCC 3 1046, DK1219 
5 (Hodgkin and Kaiser, 1 979, Mol. Gen. Genet. 171: 177-191, incorporated herein by 

reference), and the DK1622 cell lines (Kaiser, 1979, Proc. Natl. Acad. Sci. USA 76: 5952- 
5956, incorporated herein by reference). 

In another preferred embodiment, the present invention provides expression 
vectors and recombinant Pseudomonas fluoresccns host cells that contain those expression 
1 0 vectors and express a recombinant PKS of the invention. A plasmid for use in constructing 

the P. fluoresccns expression vectors and host cells of the invention is plasmid pRSFlOI 0, 
which replicates in E. coli and P. fluorescens host cells (see Scholz et aL, 1989, Gene 
75:271-8, incorporated herein by reference). Low copy number replicons and vectors can 
also be used. As noted above, the invention also provides the promoter of the Sorangium 
1 5 cellulosum epothilone PKS and epothilone modification enzyme genes in recombinant 
form. The promoter can be used to drive expression of an epothilone PKS gene or other 
gene in P. fluorescens host cells. Also, the promoter of the soraphen PKS genes can be 
used in any host cell in which a Sorungium promoter functions. Thus, in one embodiment, 
the present invention provides an epothilone PKS expression vector for use in P. 

20 fluorescens host cells. 

In another preferred embodiment, the expression vectors of the invention are used 
to construct recombinant Streptomyees host cells that express a recombinant PKS of the 
invention. Streptomyees host cells useful in accordance with the invention include 
£ coelicolor , S lividans, S. venezuelae , 5. ambofaciens, S. fradiae , and the like. Preferred 
25 Streptomyees host cell/vector combinations of the invention include S. coelicolor CH999 

and S. lividans K4-1 14 and K4-1 55 host cells, which do not produce actinorhodin, and 
expression vectors derived from. the pRMl and pRM5 vectors, as described in U.S. Patent 
No. 5,830,750 and U.S. patent application Serial Nos. 08/828,898, filed 31 Mar. 1997, and 
09/1 81,833, filed 28 Oct. 1998. Especially preferred Streptomyees host cells of the 
30 invention are those that produce an epothilone or epothilone derivative or mixtures of 
epothilones or epothilone derivatives at equal to or greater than 20 mg/L, more preferably 
at equal to or greater than 200 mg/L, and most preferably at equal to or greater than 1 g/L. 
Especially preferred are S> coelicolor and S. lividans host cells that produce at these levels. 
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Also, species of the closely related genus Saccharopolyspora can be used to produce 
epothilones, including but not limited to S. erythraea. 

The present invention provides a wide variety of expression vectors for use in 
Streptomyces. For replicating vectors, the origin of replication can be, for example and 
without limitation, a low copy number replicon and vectors comprising the same, such as 
SCP2* (see Hopwood et al, Genetic Manipulation of Streptomyces: A Laboratory manual 
(The John Innes Foundation, Norwich, U.K., 1985); Lydiatc et a /., 1985, Gene 35: 223- 
2j5, and Kieser and Melton, 1988, Gene 65: 83-91, each of which is incorporated herein 
by reference), SLPI.2 (Thompson et al 1982, Gene 20: 51-62, incorporated herein by 
reference), and pSG5(ls) (Muth et al 1989, Mol. Gen. Genet. 219: 341-348, and Bierman 
et al, 1 992, Gene 116: 43-49, each of which is incorporated herein by reference), or a high 
copy number replicon and vectors comprising the same, such as pIJlOl and pJVl (see 
Katz ct al ., 1983, J. Gen. Microbiol. 129: 2703-2714; Vara et al, 1989, J. Bacteriol. 171: 
5782-5781; and Servin-Gonzalez, 1993, Plasmid 30: 131-140, each of which is 
incorporated herein by reference). High copy number vectors are generally, however, not 
preferred for expression of large genes or multiple genes. For non-replicating and 
integrating vectors and generally for any vector, it is useful to include at least an E. coli 
origin of replication, such as from pUC, plP, pll, andpBR. For phage based vectors, the 
phage phiC3 1 and its derivative KC515 can be employed (see Hopwood et al supra). 
Also, plasmid pSET152, plasmid pSAM, plasmids pSElOl and pSE21 1, all of which 
integrate site-specifically in the chromosomal DNA of 5. lividans , can be employed. 

Typically, the expression vector will comprise one or more marker genes by which 
host cells containing the vector can be identified and/or selected. Useful antibiotic 
resistance conferring genes for use in Streptomyces host ceils include the ermE (confers 
resistance to erythromycin and lincomycin), tsr (confers resistance to thiostrepton), aadA 
(confers resistance to spectinomycin and streptomycin), aacC4 (confers resistance to 
apramycm, kanamycin, gentamicin, geneticin (G41 8), and neomycin), hyg (confers 

resistance to hygromycin), and vph (confers resistance to viomycin) resistance conferring 
genes. 

The recombinant PKS gene on the vector will be under the control of a promoter, 
typically with an attendant ribosome binding site sequence. A preferred promoter is the 
actl promoter and its attendant activator gene actII-ORF4, which is provided in the pRMl 
and pRM5 expression vectors, supra . This promoter is activated in the stationary phase of 
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growth when secondary metabolites are nonnaliy synthesized. Other useful Streptomyces 

promoters include without limitation those from the crmE gene and the melCl gene 

which act constitutively, and the tipA gene and the merA gene, which can be induced at 

an> growth stage. In addition, the T7 RNA polymerase system has been transferred to 

Streptomyces and can be employed in the vectors and host cells of the invention. In this 

system, the coding sequence for the T7 RNA polymerase is inserted into a neutral site of 

the chromosome or in a vector under the control of the inducible merA promoter, and the 

gene of interest is placed under the control of the T7 promoter. As noted above, one or 

more activator genes can also be employed to enhance the activity of a promoter. 

Activator genes in addition to the actII-ORF4 gene discussed above include dnrl, redD, 

and ptpA genes (see U.S. patent application Serial No. 09/1 8 1 ,833, supra), which can be 

employed with their cognate promoters to drive expression of a recombinant gene of the 
invention. 

The present invention also provides recombinant expression vectors that drive 
expression of the epothilone PKS and PKS enzymes that produce epothilone or epothilone 
derivatives in plant cells. Such vectors are constructed in accordance with the teachings in 
U.S. patent application Serial No. 09/1 14,083. filed 10 July 1998, and PCT patent 
publication No. 99/02669, each of which is incorporated herein by reference. Plants and 
plant cells expressing epothilone are disease resistant and able to resist fungal infection. 
For improved production of an epothilone or epothilone derivative in any heterologous 
host cells, including plant, Myxococcus, Pseudomonas, and Streptomyces host cells, one 
can also transform the cell to express a heterologous phosphopantetheinyl transferase. Sec 
U.S. patent application Serial No. 08/728,742, filed 1 1 Oct. 1 996, and PCT patent 
publication No. 97/13845, both of which are incorporated herein by reference. 

In addition to providing recombinant expression vectors that encode the epothilone 
or an epothilone derivative PKS, the present invention also provides, as discussed above, 
DNA compounds that encode epothilone modification enzyme genes. As discussed above, 
these gene products convert epothilones C and D to cpothilones A and B, and convert 
epothilones A and B to epothilones E and F. The present invention also provides 
recombinant expression vectors and host cells transformed with those vectors that express 
any one or more of those genes and so produce the corresponding epothilone or epothilone 
derivative. In one aspect, the present invention provides the epoK gene in recombinant 
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form and host cells that express the gene product thereof, which converts epothilones C 
and D to epothilones A and B, respectively. 

In another important embodiment, and as noted above, the present invention 
provides vectors for disrupting the function of any one or more of the epoL, epoK, and any 
of the ORFs associated with the epothilone PKS gene cluster in Sorangium cells. The 
invention also provides recombinant Sorangium host cells lacking (or mntainmg 
inactivated forms of) any one or more of these genes. These cells can be used to produce 

the corresponding epothilones and epothilone derivatives that result from the absence of 
any one or more of these genes. 

The invention also provides mn-Sorangium host cells that contain a recombinant 
epothilone PKS or a PKS for an epothilone derivative but do not contain (or contain non- 
functional forms of) any epothilone modification enzyme genes. These host cells of the 
invention are expected produce epothilones G and II in the absence of a dehydratase 
activity capable of forming the C-12-C-13 alkene of epothilones C and D. This 
dehydration reaction is believed to take place in the absence of the epoL gene product in 
Streptomyces host cells. The host cells produce epothilones C. and D (or the corresponding 
epothilone C and D derivative) when the dehydratase activity is present and the P450 
epoxidase and hydroxylase (that converts epothilones A and B to epothilones E and F, 
respectively) genes are absent. The host cells also produce epothilones A and B (or the 
corresponding epothilone A and B derivatives) when the hydroxylase gene only is absent. 
Preferred for expression in these host cells is the recombinant epothilone PKS enzymes of 
the invention that contain the hybrid module 4 with an AT specific for methylmalonlyl 
CoA only, optionally in combination with one or more additional hybrid modules. Also 
preferred for expression in these host cells is the recombinant epothilone PKS enzymes of 
the invention that contain the hybrid module 4 with an AT specific for malonyl CoA only, 
optionally in combination with one or more additional hybrid modules. 

The recombinant host cells of the invention can also include other genes and 
corresponding gene products that enhance production of a desired epothilone or epothilone 
derivative. As but one non-limiting example, the epothilone PKS proteins require 
phosphopantetheinylation of the ACP domains of the loading domain and modules 2 
through 9 as well as of the PCP domain of the NRPS. Phosphopantethein-ylation is 
mediated by enzymes that are called phosphopantetheinyl transferases (PPTases). To 
produce functional PKS enzyme in host cells that do not naturally express a PPTase able 
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to act on the desired PKS enzyme or to increase amounts of functional PKS enzyme in 
host cells in which the PPTase is rate-limiting, one can introduce a heterologous PPTase, 
including but not limited to Sip, as described in PCT Pat. Pub. Nos. 97/13845 and 
98/27203, and U.S. patent application Serial Nos. 08/728,742, filed 1 1 Oct. 1996, and 
08/989,332, each of which is incorporated herein by reference. 

The host cells of the invention can he grown and fermented under conditions 
known in the art for other purposes to produce the compounds of the invention. The 
compounds of the invention can be isolated front the fermentation broths of these cultured 
cells and purified by standard procedures. Fermentation conditions for producing the 
compounds of the invention from Sorangium host cells can be based on the protocols 
described in PCT patent publication Nos. 93/10121, 97/19086, 98/22461, and 99/42602, 
each of which is incorporated herein by reference. The novel epothilone analogs of the 
present invention, as well as the epothiloncs produced by the host cells of the invention, 
can be dcrivatized and formulated as described in PCT patent publication Nos. 93/10121, 
97/19086, 98/08849, 98/22461, 98/25929, 99/01124, 99/02514, 99/07692, 99/27890, 
99/39694, 99/40047, 99/42602, 99/43653, 99/43320. 99/54319, 99/54319, and 99/54330, 
and U.S. Patent No. 5,969,145, each of which is incorporated herein by reference. 

Invention Compounds 

Preferred compounds of the invention include the 14-methyl epothilone derivatives 
(made by utilization of the hybrid module 3 of the invention that has an AT that binds 
mcthylmalonyl CoA instead of malonyl CoA); the 8,9-dehydro epothilone derivatives 
(made by utilization of the hybrid module 6 of the invention that has a DH and KR instead 
of an ER, DH, and KR); the 10-methyl epothilone derivatives (made by utilization of the 
hybrid module 5 of the invention that has an AT that binds methylmalonyl CoA instead of 
malonyl CoA); the 9-hydroxy epothilone derivatives (made by utilization of the hybrid 
module 6 of the invention that has a KR instead of an ER, DH, and KR); the 8-desmethyl- 
1 4-methyl epothilone derivatives (made by utilization of the hybrid module 3 of the 
invention that has an AT that binds methylmalonyl CoA instead of malonyl CoA and a 
hybrid module 6 that binds malonyl CoA instead of methylmalonyl CoA ); and the 8- 
desmethyl-8 ,9-dehydro epothilone derivatives (made by utilization of the hybrid module 6 
of the invention that has a DH and KR instead of an ER, DH, and KR and an AT that 
specifies malonyl CoA instead of methylmalonyl CoA). 
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More generally, preferred epothilone derivative compounds of the invention are 
those that can be produced by altering the epothilone PKS genes as described herein and 
optionally by action of epothilone modification enzymes and/or by chemically modifying 
the resulting epothiloncs produced when those genes are expressed. Thus, the present 
invention provides compounds of the formula: 



including the glycosylated forms thereof and stereoisomeric forms where the 
stereochemistry is not shown, 

wherein A is a substituted or unsubstituted straight, branched chain or cyclic alkyl, 
alkenyl or alkynyl residue optionally containing 1 -3 heteroatoms selected from O, S and 
N; or wherein A comprises a substituted or unsubstituted aromatic residue; 

R represents H,H, or H, lower alkyl, or lower alkyl, lower alkyl; 

X 5 represents =0 or a derivative thereof, or H,OH or H,NR : wherein R is H, or 
alkyl, or acyl or H,OCOR or H,OCONR 2 wherein R is H or alkyl, or is II, H; 

R 6 represents H or lower alkyl, and the remaining substituent on the corresponding 
carbon is H; 

X represents OR, NR 2 , wherein R is H, or alkyl or acyl or is OCOR, or OCONR 2 
wherein R is H or alkyl or X 7 taken together with X 9 forms a carbonate or carbamate 
cycle, and wherein the remaining substituent on the corresponding carbon is H; 

R represents H or lower alkyl and the remaining substituent on the carbon is H; 

X 9 represents =0 or a derivative thereof, or is H,OR or H,NR 2 , wherein R is H, or 
alkyl or acyl or is H.OCOR or H,OCONR 2 wherein R is II or alkyl, or represents H.H or 
wherein X together with X or with X 1 1 can form a cyclic carbonate or carbamate; 

R 10 is H.H or H, lower alkyl, or lower alkyl, lower alkyl; 
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X n is -O or a derivative thereof, or is H.OR, or H,NR 2 wherein R is H, or alkyl or 
acyl or is H,OCOR or H,OCONR 2 wherein R is H or alkyl, or is H,H or wherein X 11 in 
combination with X may form a cyclic carbonate or carbamate; 

R is H,H, or H, lower alkyl, or lower aIkyl,lower alkyl; 

5 X 13 is =0 or a derivative thereof, or H.OR or H,NR 2 wherein R is H, alkyl or acyl 

or is H.OCOR or H,OCONR 2 wherein R is H or alkyl; 

R 14 is H,H, or H.lower alkyl, or lower alkyl.lower alkyl; 

R 16 is H or lower alkyl; and 

wherein optionally II or another substituent may be removed from positions 12 and 
10 13 and/or 8 and 9 to form a double bond, wherein said double bond may optionally be 


converted to an epoxide. 

Particularly preferred are compounds of the formulas 

R 12 R 10 



1(b) 
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R ,J R 10 



A 

O OH x° 

. He) 

wherein the noted substituents are as defined above. 

Especially preferred arc compounds of the formulas 


R 12 R 10 



1(e) 

1 0 wherein both Z are O or one Z is N and the other Z is O, and the remaining substitue* 

are as defined above. 

As used herein, a substituent which “comprises an aromatic moiety” contains* 
least one aromatic ring, such as phenyl, pyridyl, pyrimidyl, thiophenyl, or thiazolyl. k 
substituent may also include fused aromatic residues such as naphthyl, indolyl, 

1 5 benzothiazolyl, and the like. The aromatic moiety may also be fused to a nonaromatiang 



55 


WO 00/31247 


-61 - 


PCT/US99/27438 


and/or may be coupled to the remainder of the compound in which it is a substituent 
through a nonaromatic, for example, alkylene residue. The aromatic moiety may be 
substituted or unsubstituted as may the remainder of the substituent 

Preferred embodiments of A include the “R” groups shown in Figure 2. 

As used herein, the term alkyl refers to a Cj-Cg saturated, straight or branched 
chain hydrocarbon radical derived from a hydrocarbon moiety by removal of a single 
hydrogen atom. Alkenyl and alkynyl refer to the corresponding unsaturated forms. 
Examples of alkyl include but are not limited to methyl, ethyl, propyl, isopropyl, n-butyl, 
ten-butyl, neopentyi, i-hexyl, n-hcptyl, n-octyl. Lower alky] (or alkenyl or alkynyl) refers 
to a 1-4C radical. Methyl is preferred. Acyl refers to alkylCO, alkenylCO or alkynylCO. 

The terms halo and halogen as used herein refer to an atom selected from fluorine, 
chlonne, bromine, and iodine. The term haloalkyl as used herein denotes an alkyl group to 
which one, two, or three halogen atoms are attached to any one carbon and includes 
without limitation chloromethyl, bromoethyl, trifluoromethyl, and the like. 

The term heteroaryl as used herein refers to a cyclic aromatic radical having from 
five to ten ring atoms of which one ring atom is selected from S, O, and N; zero, one, or 
two ring atoms are additional hetcroatoms independently selected from S, O, and N; and 
the remaining ring atoms are carbon, the radical being joined to the rest of the molecule 
via any of the ring atoms, such as, for example, pyridyl, pyrazinyl, pyrimidinyl, pyrrnlyl, 
pyrazolyl, imidazolyl, thiazolvl, oxazolyl, isoxazolyl, thiadiazolyl, oxadiazolyl, 
thiophenyl, furanyl, quinolinyl, isoquinolinyl, and the like. 

The term heterocyle includes but is not limited to pyrrolidinyl, pyrazolinyl, 
pyrazolidinyl, imidazolinyl, imidazolidinyl, piperidinyl, piperazinyl, oxazolidinyl, 
isoxazolidinyl, morpholinyl, thiazolidinyl, isothiazolidinyl, and tctrahydrofuryl. 

The term “substituted” as used herein refers to a group substituted by independent 
replacement of any of the hydrogen atoms thereon with, for example, Cl, Br, F, I, OH, CN, 
alkyl, alkoxy, alkoxy substituted with aryl, haloalkyl, alkylthio, amino, alkylamino, 
dialkylammo, mercapto, nitro, carboxaldehyde, carboxy, alkoxycarbonyl, or carboxamide. 
Any one substituent may be an aryl, heteroaryl, or heterocycloalkyl group. 

It will apparent that the nature of the substituents at positions 2, 4, 6, 8, 10, 12, 14 
and 16 in formula (1) is determined at least initially by the specificity of the AT catalytic 
domain of modules 9, 8, 7, 6, 5, 4, 3 and 2, respectively. Because AT domains that accept 
malony] CoA, mcthylmalonyl CoA, ethylmalonyl CoA (and in general, lower alkyl 
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malonyl CoA), as weU as hydroxymalonyl CoA, are available, one of the substituents at 
these positions may be H, and the other may be H, lower ulkyl, especially methyl and 
ethyl, or OH. Further reaction at these positions, e.g., a methyl transferase reaction such as 
that catalyzed by module 8 of the epothilone PKS, may be used to replace H at these 
5 positions as well. Further, an H,OH embodiment may be oxidized to =0 or, with the 

adjacent ring C, be dehydrated to form a n-bond. Both OH and =0 are readily derivatized 
as further described below. 

Thus, a wide variety of embodiments of R 2 , R 6 , R 8 , R 10 , R 1J , R 14 and R 16 is 
synthetically available. The restrictions set forth with regard to embodiments of these 
1 0 substituents set forth in the definitions with respect to Formula ( 1 ) above reflect the 
information described in the SAR description in Example 8 below. 

Similarly, (J-carbonyl modifications (or absence of modification) can readily be 
controlled by modifying the epothilone PKS gene cluster to include the appropriate 
sequences in the corresponding positions of the epothilone gene cluster which will or will 
1 5 not contain active KR, DH and/or ER domains. Thus, the embodiments of X 5 , X 7 , X 5 , X 1 1 

and X 13 synthetically available arc numerous, including the formation of rt-bonds with the 
adjacent ring positions. 

Positions occupied by OH are readily converted to ethers or esters by means well 
known in the art; protection of OH at positions not to be derivatized may be required. 

20 Further, a hydroxyl may be converted to a leaving group, such as a tosylatc, and replaced 

by an amino or halo substituent. A wide variety of “hydroxyl derivatives” such as those 
discussed above is known in the art. 

Similarly, ring positions which contain oxo groups may be converted to “carbonyl 

derivatives” such as oximes, ketals, and the like. Initial reaction products with the oxo 

25 moieties may he further reacted to obtain more complex derivatives. As described in 

Example 8, such derivatives may ultimately result in a cyclic substituent linking two ring 
positions. 

The enzymes useful in modification of the polyketidc initially synthesized, such as 
transmethylases, dehydratases, oxidases, glycosylation enzymes and the like, can be 
30 supplied endogenously by a host cell when the polyketide is synthesized intraccllularly, by 
modifying a host to contain the recombinant materials for the production of these 
modifying enzymes, or can be supplied in a cell-free system, either in purified forms or as 
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relatively crude extracts. Thus, for example, the epoxidation of the tt-bond at position 12- 
1 3 may be effected using the protein product of the epoK gene directly in vitro. 

The nature of A is most conveniently controlled by employing an epothilone PKS 
which comprises an inactivated module 1 NRPS (using a module 2 substrate) or a KS2 
knockout (using a module 3 substrate) as described in Example 6, hereinbelow. Limited 
variation can be obtained by altering the AT catalytic specificity of the loading module; 
further variation is accomplished by replacing the NRPS of module 1 with an NRPS of 
different specificity or with a conventional PKS module. However, at present, variants are 
more readily prepared by feeding the synthetic module 2 substrate precursors and module 

3 substrate precursors to the appropriately altered epothilone PKS as described in Example 

6 . 


Pharmaceutical Compositions 

The compounds can be readily formulated to provide the pharmaceutical 
compositions of the invention. The pharmaceutical compositions of the invention can be 
used in the form of a pharmaceutical preparation, for example, in solid, semisolid, or 
liquid form. This preparation will contain one or more of the compounds of the invention 
as an active ingredient in admixture with an organic or inorganic carrier or excipient 
suitable for external, enteral, or parenteral application. The active ingredient may be 
compounded, for example, with the usual non-toxic, pharmaceutically acceptable carriers 
for tablets, pellets, capsules, suppositories, pessaries, solutions, emulsions, suspensions, 
and any other form suitable for use. 

The carriers which can be used include water, glucose, lactose, gum acacia, gelatin, 
mannitol, starch paste, magnesium trisilicate, talc, com starch, keratin, colloidal silica, 
potato starch, urea, and other carriers suitable for use in manufacturing preparations, in 
solid, semi-solid, or liquified form. In addition, auxiliary stabilizing, thickening, and 
coloring agents and perfumes may be used. For example, the compounds of the invention 
may be utilized with hydroxypropyl methylcelluiose essentially as described in U.S. Patent 
No. 4,916.138, incorporated herein by reference, or with a surfactant essentially as 
described in EPO patent publication No. 428,1 69, incorporated herein by reference. 

Oral dosage forms may be prepared essentially as described by Hondo et al., 1987, 
Transplantation Proceedings XIX, Supp. 6: 17-22, incorporated herein by reference. 

Dosage forms for external application may be prepared essentially as described in EPO 
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patent publication No. 423,7 1 4, incorporated herein by reference. The active compound is 
included in the pharmaceutical composition in an amount sufficient to produce the desired 
effect upon the disease process or condition. 

For the treatment of conditions and diseases caused by infection, immune system 
5 disorder (or to suppress immune function), or cancer, a compound of the invention may be 
administered orally, topically, parcnterally, by inhalation spray, or rectally in dosage unit 
formulations containing conventional non-toxic pharmaceutically acceptable carriers, 
adjuvant, and vehicles. The term parenteral, as used herein, includes subcutaneous 
injections, and intravenous, intrathecal, intramuscular, and intrasternal injection or 
1 0 infusion techniques. 

Dosage levels of the compounds of the present invention are of the order from 
about 0.01 mg to about 1 00 mg per kilogram of body weight per day, preferably from 
about 0. 1 mg to about 50 mg per kilogram of body weight per day. The dosage levels are 
useful in the treatment of the above-indicated conditions (from about 0.7 mg to about 3.5 
15 mg per patient per day, assuming a 70 kg patient). In addition, the compounds of the 
present invention may be administered on an intermiUent basis, i.e.. at semi-weekly, 
weekly, semi-monthly, or monthly intervals. 

The amount of active ingredient that may be combined with the carrier materials to 
produce a single dosage form will vary depending upon the host treated and the particular 
20 mode of administration. For example, a formulation intended for oral administration to 
humans may contain from 0.5 mg to 5 gm of active agent compounded with an appropriate 
and convenient amount of carrier material, which may vary from about 5 percent to about 
95 percent of the total composition. Dosage unit forms will generally contain from about 
0.5 mg to about 500 mg of active ingredient. For external administration, the compounds 
25 of the invention may be formulated within the range of, for example, 0.00001% to 60% by 

weight, preferably from 0.001% to 10% by weight, and most preferably from about 
0.005% to 0.8% by weight. 

It will be understood, however, that the specific dose level for any particular 
patient will depend on a variety of factors. These factors include the activity of the specific 
30 compound employed; the age, body weight, general health, sex, and diet of the subject; the 
time and route of administration and the rate of excretion of the drug; whether a drug 
combination is employed in the treatment; and the severity of the particular disease or 
condition for which therapy is sought. 
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A detailed description of the invention having been provided above, the following 
examples are given for the purpose of illustrating the present invention and shall not be 
construed as being a limitation on the scope of the invention or claims. 

Example 1 

DNA Sequencing of Cosmid Clones and Subclones Thereof 
The epothilone producing strain, Sorangium cellulosum SMP44, was grown on a 
cellulose-containing medium, see Bollag et al ., 1995, Cancer Research 55: 2325-2333, 
incorporated herein by reference, and epothilone production was confirmed by LC/MS 
analysis of the culture supernatant. Total DNA was prepared from this strain using the 
procedure described by Jaoua et al, 1 992, Plasmid 28: 1 57-165, incorporated herein by 
reference. To prepare a cosmid library, S. cellulosum genomic DNA was partially digested 
with Sau3 AI and ligated with BamHI-digestcd pSupercos (Stratagene). The DNA was 
packaged in lambda phage as recommended by the manufacturer and the mixture then 
used to infect E. coli XL 1 -Blue MR cells. This procedure yielded approximately 3,000 
isolated colonies on LB-ampicillin plates. Because the size of the £ cellulosum genome is 
estimated to be circa 1 0 7 nucleotides, the DNA inserts present among 3000 colonies would 
correspond to circa 1 0 S. cellulosum genomes. 

To screen the library, two segments of KS domains were used to design 
oligonucleotide primers for a PCR with Sorangium cellulosum genomic DNA as template. 
The fragment generated was then used as a probe to screen the library. This approach was 
chosen, because it was found, from the examination of over a dozen PKS genes, that KS 
domains are the most highly conserved (at the amino add level) of ail the PKS domains 
examined. Therefore, it was expected that the probes produced would detect not only the 
epothilone PKS genes but also other PKS gene clusters represented in the libraiy. The two 
degenerate oligonucleotides synthesized using conserved regions within the ketosynthase 
(KS) domains compiled from the DF.BS and soraphen PKS gene sequences were (standard 
nomenclature for degenerate positions is used): CTSGTSKCSSTBCACCTSGCSTGC and 
TGAYRTGSGCGTTSGTSCCGSWGA. A single band of ~750 bp, coiresponding to the 
predicted size, was seen in an agarose gel after PCR employing the oligos as primers and 
S. cellulosum SMP44 genomic DNA as template. Hie fragment was removed from the gel 
and cloned in the Hindi site ofpUCl 18 (which is a derivative of pUC18 with an insert 
sequence for making single stranded DNA). After transformation of£. coli, plasmid DNA 
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from ten independent clones was isolated and sequenced. The analysis revealed nine 
unique sequences that each corresponded to a common segment of KS domains in PKS 
genes. Of the nine, three were identical to a polyketide synthase gene cluster previously 
isolated from this organism and determined not to belong to the epothilone gene cluster 
5 from the analysis of the modules. The remaining six KS fragments were excised from the 
vector, pooled, end-labeled with 32 P and used as probe in hybridizations with the colonies 
containing the cosmid library under high stringency conditions. 

The screen identified 15 cosmids that hybridized to the pooled KS probes. DNA 
was prepared from each cosmid, digested with Notl, separated on an agarose gel, and 
1 0 transferred to a nitrocellulose membrane for Southern hybridization using the pooled KS 
fragments as probe. The results revealed that two of the cosmids did not contain KS- 
hybridizing inserts, leaving 13 cosmids to analyze further. The blot was stripped of the 
label and re-probed, under less stringent conditions, with labeled DNA containing the 
sequence corresponding to the enoylrcductase domain from module four of the DEBS 
1 5 gene cluster. Because it was anticipated that the epothilone PKS gene cluster would 
encode two consecutive modules that contain an ER domain, and because not all PKS 
gene clusters have ER domain-containing modules, hybridization with the ER probe was 
predicted to identify cosmids containing insert DNA from the epothilone PKS gene 
cluster. Two cosmids were found to hybridize strongly to the ER probe, one hybridized 
20 moderately, and a final cosmid hybridized weakly. Analysis of the restriction pattern of 
the Notl fragments indicated that the two cosmids that hybridized strongly with the ER 
probe overlapped one another. The nucleotide sequence was also obtained from the ends 
of each of the 13 cosmids using the T7 and T3 primer binding sites. All contained 
sequences that showed homology to PKS genes. Sequence from one of the cosmids that 
25 hybridized strongly to the ER probe showed homology to NRPSs and, in particular, to the 
adenylation domain of an NRPS. Because it was anticipated that the thiazole moiety of 
epothilone might be derived from the formation of an amide bond between an acetate and 
cysteine molecule (with a subsequent cyclization step), the presence of an NRPS domain 
in a cosmid that also contained ER domain(s) supported the prediction that this cosmid 
30 might contain all or part of the epothilone PKS gene cluster. 

Preliminary restriction analysis of the 12 remaining cosmids suggested that three 
might overlap with the cosmid of interest. *1 o verily this, oligonucleotides were 
synthesized for each end of the four cosmids (determined from the end sequencing 
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described above) and used as primer sets in PCRs with each of the four cosmid DNAs. 
Overlap would be indicated by the appearance of a band from a non-cognate primer- 
template reaction. The results of this experiment verified that two of the cosmids 
overlapped with the cosmid containing the NRPS. Restriction mapping of the three 
5 cosmids revealed that the cosmids did, in fact, overlap. Furthermore, because PKS 

sequences extended to the end of the insert in the last overlapping fragment, based on the 
assumption that the NRPS would map to the 5'-end of the cluster, the results also indicated 
that the 3' end of the gene cluster had not been isolated among the clones identified. 

To isolate the remaining segment of the epothilone biosynthesis genes, a PCR 
1 0 fragment was generated from the cosmid containing the most 3'-terminal region of the 
putative gene cluster. This fragment was used as a probe to screen a newly prepared 
cosmid library of Sorangium ceUulosum genomic DNA of again approximately 3000 
colonies. Several hybridizing clones were identified; DNA was made from six of them. 
Analysis of Notl-digested fragments indicated that all contained overlapping regions. The 
1 5 cosmid containing the largest insert DNA that also had the shortest overlap with the 
cosmid used to make the probe was selected for further analysis. 

Restriction maps were created for the four cosmids, as shown in Figure 1. 

Sequence obtained from one of the ends of cosmid pKOS3 5-70.8 A3 showed no homology 
to PKS sequences or any associated modifying enzymes. Similarly, sequence from one 
20 end of cosmid pKOS35-79.85 also did not contain sequences corresponding to a PKS 

region. These findings supported the observation that the epothilone cluster was contained 
within the -70 kb region encompassed by the four cosmid inserts. 

To sequence the inserts in the cosmids, each of the Notl restriction fragments from 
the four cosmids was cloned into the Notl site of the commercially available pBluescript 
25 plasmid. Initial sequencing was performed on the ends of each of the clones. Analysis of 
the sequences allowed the prediction, before having the complete sequence, that there 
would be 10 modules in this PKS gene cluster, a loading domain plus 9 modules. 

Sequence was obtained for the complete PKS as follows. Each of the 13 non- 
overlapping Notl fragments was isolated and subjected to partial HinPT digestion. 

30 Fragments of -2 to 4 kb in length were removed from an agarose gel and cloned in the 
AccI site of pUCl 18. Sufficient clones from each library of the Notl fragments were 
sequenced to provide at least 4 -fold coverage of each. To sequence across each of the 
Notl sites, a set of oligos, one 5’ and the other 3' to each Notl site, was made and used as 
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primers in PCR amplification of a fragment that contained each Notl site. F«<-h fragment 
produced in this manner was cloned and sequenced. 

The nucleotide sequence was determined for a linear segment corresponding to 
-72 kb. Analysis revealed a PKS gene cluster with a loading domain and nine modules. 
Downstream of the PKS sequenoe is an ORF, designated epoK, that shows strong 
homology to cytochrome P450 oxidase genes and encodes the epothilone epoxidase. The 
nucleotide sequence of 1 5 kb downstream of epoK has also been determined: a number of 
additional ORFs have been identified but an ORF that shows homology to any known 
dehydratase has not been identified. The epoL gene may encode a dehydratase activity, but 

this activity may instead be resident within the epothilone PKS or encoded by another 
gene. 

The PKS genes are organized in 6 open reading frames. At the polypeptide level, 
the loading domain and modules 1, 2, and 9 appear on individual polypeptides; their 
corresponding genes are designated epoA, epoB , epoC and epoF respectively. Modules 3, 
4, 5, and 6 are contained on a single polypeptide whose gene is designated epoD, and 
modules 7 and 8 are on another polypeptide whose gene is designated epoE . It is clear 
from the spacing between ORFs that epoC, epoD, epoE and epoF constitute an operon. 

The epoA, epoB, and epoK gene may be also part of the large operon, but there are spaces 
of approximately 100 bp between epoB and epoC and 1 15 bp between epoF and epoK 
which could contain a promoter. The present invention provides the intergenic sequences 
in recombinant form. At least one, but potentially more than one, promoter is used to 

express all of the epothilone genes. The epothilone PKS gene cluster is shown 
schematically below. 


tpu tpCO epoC 


PKS 

</>iO 


Lead Mod I Mod 2 
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A detailed examination of the modules shows an organization and composition that 
is consistent with one able to be used for the biosynthesis of epothilone. The description 
that follows is at the polypeptide level. The sequence of the AT domain in the loading 
module and in modules 3, 4, 5, and 9 shows similarity to the consensus sequence for 
malonyl loading domains, consistent with the presence of an H side chain at C-14, C-l 2 
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(cpothilones A and C), C-10, and C-2, respectively, as well as the loading region. The AT 
domains in modules 2, 6, 7, and 8 resemble the consensus sequence for methylmalonyj 

specifying AT domains, again consistent with the presence of methyl side chains at C-16, 
C-8, C-6, and C-4 respectively. 

The loading module contains a KS domain in which the cysteine residue usually 
present at the active site is instead a tyrosine. This domain is designated as KS 1 ' and serves 
as a decarboxylase, which is part of its normal fiinction, but cannot function as a 
condensing enzyme. Thus, the loading domain is expected to load malonyl CoA, move it 

to the ACP, and decarboxylate it to yield the acetyl residue required for condensation with 
cysteine. 

Module 1 is the non-ribosomal peptide synthetase that activates cvstcine and 

catalyzes the condensation with acetate on the loading module. The sequence contains 

segments highly similar to ATP-binding and ATPase domains, required for activation of 

amino acids, a phosphopantotheinylation site, and an elongation domain. In database 

searches, module 1 shows very high similarity to a number of previously identified peptide 
synthetases. 

Module 2 determines the structure of epothilone at C-15 - C-17. The presence of 
the DH domain in module 2 yields the C-16-17 dehydro moiety in the molecule. The 
domains in module 3 are consistent with the structure of epothilone at C- 14 and C-15; the 
OH that comes from the action of the KR is employed in the lactonization of the molecule. 

Module 4 controls the structure at C- 12 and C- 13 where a double bond is found in 
epothilones C and D, consistent with the presence of a DH domain. Although the sequence 
of the AT domain appears to resemble those that specify maionate loading, it can also load 
methylmalonate, thereby accounting in part for the mixture of epothilones found in the 
fermentation broths of the naturally producing organisms. 

A significant departure from the expected array of functions was found in module 
4. This module was expected to contain a DH domain, thereby directing the synthesis of 
epothilones C and D as the products of the PKS. Rigorous analysis revealed that the space 
between the AT and KR domains of module 4 was not large enough to accommodate a 
functional DH domain. Thus, the extent of reduction at module 4 does not proceed beyond 
the ketoreduction of the beta-keto formed after the condensation directed by module 4. 
Because the C-12,13 unsaturation has been demonstrated (epothilones C and D), there 
must be an additional dehydratase function that introduces the double bond, and this 
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function is believed to be in the PKS itself or resident in an OR F in the epothilone 
biosynthetic gene cluster. 

Thus, the action of the dehydratase could occur either during the synthesis of the 
polyketide or after cyclization has taken place. In the former case, the compounds 
produced at the end of acyl chain growth would be epothilones C and D. If the C-12,13 
dehydration were a post-polyketide event, the completed acyl chain would have a 
hydroxyl group at C-13, as shown below. The names epothilones G and H have been 

assigned to the 13-hydroxy compounds produced in the absence of or prior to the action of 
the dehydratase. 



Epothilones G (R=H) and H (R=CH 3 ). 

Modules 5 and 6 each have the full set of reduction domains (ICR, DH and ER) to 
yield the methylene functions at C-l 1 and C-9. Modules 7 and 9 have KR domains to yield 
the hydroxyls at C-7 and C-3. and module 8 does not have a functional KR domain, 
consistent with the presence of the keto group at C-5. Module 8 also contains a 
methyltransferase (MT) domain that results in the presence of the geminal dimethyl 
function at C-4. Module 9 has a thioesterase domain that terminates polyketide synthesis 
and catalyzes ring closure. The genes, proteins, modules, and domains of the epothilone 
PKS are summarized in the Table hereinabove. 

Inspection of the sequence has revealed translational coupling between epoA and 
epoB (loading domain and module 1) and between epoC and epoD. Very small gaps are 
seen between epoD and epoE and epoE and epoF but gaps exceeding 100 bp are found 
between epoB and epoC and epoF and epoK. These intergenic regions may contain 
promoters. Sequencing efforts have not revealed the presence of regulatory genes, and it is 
possible that epothilone synthesis is not regulated by operon specific regulation in 
Sarangium cellulosum. 
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The sequence of the epothilone PKS and flanking regions has been compiled into a 
single contig, as shown below. 
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1 

61 
121 
181 
241 
301 
361 
421 
481 
541 
601 
661 
721 
781 
841 
901 
961 
1021 
1081 
1141 
1201 
1261 
1321 
1381 
1441 
1501 
1561 
1621 
1681 
1741 
1801 
1861 
1921 
1981 
2041 
2101 
2161 
2221 
2281 
2341 
2401 
2461 
2521 
2581 
2641 
2701 
2761 
2821 
2831 
2941 
3001 
3061 
3121 
3181 
3241 
3301 


TCGT3CGCGG 
ACGACAACCT 
CGCTGCTGGC 
GCAACGAGAA 
CCCGGGCCTA 
CGGCGCTCGA 
ACGAGCGCAG 
AGGTCGAGGT 
ACGACCGGAC 
ACGGCAACCA 
AGCCCGAGCA 
TTGATCGCCT 
GCGGCGATAA 
CCGCCGAGCT 
CCGTCCGCCA 
TCCCCGTCAC 
ACGACGCCCT 
CC3GCTCAAG 
GCCC7GGCTT 
ACGCCGCCTG 
CTGGCCCAAG 
CCTGCTCTTC 
GAACAAGGCA 
CGTCGACCGG 
GAAGGAAGCC 
AGCGGCATTT 
GCGGATCGAG 
CCAGCAATGT 
TCTCCCCGTC 
TGATCTGCCT 
TTTTCTGAGG 
CATCCATTT7 
GTGAGCGAAG 
CTGAGGATGT 
CGATCGTCGG 
CGCTCCTCGA 
CAGCGTGGTT 
TCCTGAGCGA 
TGCGGATGGA 
CCGCGATCGC 
CGTCCGAATA 
GCGGGCTGGG 
GAGGGCCGTG 
CCTGTCAGAG 
TGTTGTCGCC 
GCTGCAAGGC 
TGGTCCTCAA 
GAGGATCCGC 
CCCAAGAAAT 
GTTATGTCGA 
TGAATGCGGT 
AGACCAACCT 
TGTCCCTTCA 
TCTCATGGGG 
ATACGCCGCG 
7GCTGGAAGA 


GCACGTCGAG 
CAAGAACGCC 
TCTGTCGGCG 
GGGCCGCGTC 
CGCCGACCTC 
TCGCTCCTGG 
C3TGCTGCTG 
CGGAAAGACC 
GCGCCGCACG 
GATCGTCGCG 
CCTCCAGCGC 
CGCGCGCGCC 
CGTCGGCAGC 
CGAAGAGGCC 
GGTGATCGAC 
CCGCGGCGAG 
GAAGAAGGAC 
AGCC7CGGCC 
CGCGAGGTGC 
AAGAACTCCC 
AAGATC3ACC 
GAGGTCGTCA 
TTCGCCGACT 
CTCGTGCACC 
AAGGAGCTCA 
TCACCGGTGA 
ACCGTGCTCA 
CATGGGAATG 
AATTCCCGAG 
TACGTTACGT 
GGGCTTGGTC 
TTTGACACTC 
AACCTGGGGC 
GCCCGTCGTG 
AGCGGGCrGC 
GGGCTCGCGC 
TGATCCCGAC 
CGTAGCCTGC 
CCCTGCACAT 
TCCATCGGCG 
TGAGGCCGCG 
CACGATGCCC 
TGTCGCGGTG 
CTTGCGCTCC 
GAGCACCCTC 
GTTTTCGGCG 
GCGGCTCAGT 
GATCAATCAC 
CGTGCTGAAA 
GGCACACGGC 
ATACGGCCTC 
TGGCCATCCT 
GCACGGGCAG 
TGATCTTCGG 
ACGGGCGGGG 
GGCGCCGGCG 


GCGTTTGCCG 
GTCGTCGAGC 
GATTACCGCT 
GAGCGCGCCA 
GGAGACCTCA 
GTCGAGGACC 
CGACACCCTG 
CCCTACGCGC 
CTGGTCGTCC 
ACCCACGTCC 
CTGGTCGACC 
GCCCGCAGCA 
GCGATCGCCC 
CTGGTCGAGG 
CGCCGCCGCT 
CACGCCGCCC 
CCGACGCCAT 
TCTTCGGCCT 
TCGCCATCGA 
GCGTCGCCGC 
GCGAGGCCGT 
CCCG7CGCTA 
GGGGCCAGGT 
GCGCCGAGGT 
ACGCCACCCG 
ACTTCACCGA 
CGGCGTGGAC 
GCCCCTTGAG 
CGTAAAAGAA 
CTTCCGCACC 
TCTGGTTCCT 
TGCTCAAAGG 
TCGACCGGAG 
GCGGATCGTC 
CGTCTGCCCG 
GACACCGTCG 
CTCGATGCCC 
TTCGACGCCT 
CGACTCTTGC 
CTCGTCGGTA 
Cl'GCCGCGAG 
AGCGTCGGAG 
GATACGGCCT 
GGGGAATGCT 
GTGTGGCTCT 
GAGGCCGATG 
GGAGCCCGCG 
GACGGAGCGA 
CGCCCCCTGG 
ACGGGCACGA 
GGGCGAGACG 
GAGTATGCGT 
ATTCCTGCGC 
CTGACCGTCA 
GTGAGCTCGT 
GCGACGTGCA 


ACTTCGGCGG 
GCCACGGCGA 
TCGAGCCGCG 
TCCGCTACGT 
ACCGCCAAGC 
GCGCCCGCAC 
ACACACCGrT 
GCTTCGATCT 
TCGCCGACCT 
GTTCGTGGGA 
AGAAGCGCCG 
GCCAGGCATT 
GGCTTCTGCA 
TGCTTGAGCG 
CCGAGCGCCA 
7CGTCGTCAC 
GACCGACCTG 
GCTCGCCTGC 
GGAGCGCGAG 
CTTCAAGCCC 
CGACGACCTC 
CGACGCGCAG 
CTTCCCGCAC 
GATCGAGATC 
CACCAAGCAG 
AATCCCGCGT 
GACATGGCGC 
GGGCTGGCCG 
AAATTTGTCA 
TCGAGCGAAT 
CAGGAACCCT 
GAT TAG ACC G 
GACGATCGAC 
CCATCGAGCG 
GTGGCGTGAT 
GGCAAGTCCC 
CGGGGAAGAC 
CCTTCTTCGG 
TGGAGGTGTG 
CGGAAACGGG 
CGACGGCCTC 
CGGGCCGAAT 
ATTCGTCCTC 
CCACGGCCCT 
CGAAGACCCG 
GGTTCGGACG 
CGGACGGCGA 
GCAGCGGTCT 
CGGACGCAGG 
CGCTTGGTGA 
TCCCCACGCC 
CGGGGATCAC 
ACCTCCACGC 
CGCGCGCCCG 
TCGGCATGAG 
CACCGCCGGC 


CGTCCCGCGC 
CGCGATCCGG 
CCCCGTCCCC 
CCGCGAGGGC 
GACCGAGTGG 
CGTGCGTCAG 
TCCGGACCAC 
CAACGACTAC 
CAGTCAGGTA 
CCGCGGCCAG 
CGCCCGCGAG 
CCTGCGCATC 
ACTGCTCGAC 
CGACACCATC 
CCTGCCGCCT 
GCCGCATTCC 
ACGCCCACCG 
TCGGAGCAGC 
CGCCACAAGO 
ATG AC CG ACT 
TACGATAGCC 
AAGCCGCTCT 
GCCGCGTCCG 
GAGGCCGAGA 
CGCCGCACCA 
GTTGCCGAGA 
GGAAACGTCG 
GGGTCGACCA 
TAGATCGTAA 
TCTCTCGGAT 
GATCGGGACG 
AGTGAGACAG 
GTCCGCGAGC 
CGCAGCCGAA 
CGATCTGAGC 
CGCCGAACGC 
GCCCGTTACG 
CATCTCGCCT 
CTGGGAGGCG 
AGTGTTCATC 
CGCAGAGATC 
CTCGTATGTC 
GCTCGTGGCC 
GGCTGGTGGG 
CGCGCTGGCC 
AGGCGAAGGG 
CCGGATATTG 
GACCGTCCCG 
CTGCGCCGCG 
CCCCATCGAA 
GCTGCTGATC 
TGGGCTGCTG 
GCACGCGCTG 
GACACCGTGG 
CGGGACCAAC 
G CCGGAGCGG 


GTGCTGCTCT 
TTCCACCCCA 
GTCGCCCGCG 
TTCTTCGAGG 
ACCAGCTCCG 
GCCTTCGACG 
GAGCGCGTCG 
TCGGTCCCCC 
CGCATCGCCG 
CAGATCGAGC 
CACCGCGGCC 
GTCGCCGAGC 
GCCGTGGGCG 
CACATCGGTG 
CCAGTCTCAA 
CTCACGACCT 
AGACCAAAGA 
TCGCCGACAA 
GCAGCCTCGA 
TCGACTCGTC 
GCTACGCGGA 
TGCTCAGCAC 
TCGTCACGCT 
GC TACCGGCT 
AGAAGCACTG 
TCATCTACAG 
TCGTAACTGC 
TATCGCGCGA 
GCTGTGCTAG 
AACTTTCAAG 
AGCTAATTCC 
TTCTTTTGCA 
GGGTCAGCCG 
GATCCGATTG 
GGGTTCTGGA 
TGGGATGCAG 
CGCGCATCTT 
CGCGAAGCGC 
CTGGAGAACG 
GGGATCGGCC 
GACGCTCATG 
CTCGGGCTGC 
GTTCATCTGG 
GTATCGCTGA 
ACGGACGCTC 
TGCGCCGTCG 
GCGGTGATTC 
AACGGGAGCT 
TCTTCGGTGG 
ATCCAAGCTC 
GGGTCGGTGA 
AAGGTCGTCT 
AACCCCCGGA 
CCGGACTGGA 
GCGCACGTGG 
CCGGCAGAGC 
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3361 
3421 
3481 
3541 
3601 
3661 
3721 
3781 
3841 
3901 
3961 
4021 
4081 
4141 
4201 
4261 
4321 
4381 
4 441 
4501 
4561 
4 621 
4681 
4 741 
4801 
4861 
4921 
4 931 
5041 
5101 
5161 
5221 
5281 
5341 
5401 
54 61 
5521 
5581 
5641 
5701 
5761 
5821 
5881 
5941 
6001 
6061 
6121 
6181 
6241 
6301 
6361 
6421 
6481 
6541 
6601 
6661 
6721 
6781 
6841 


TGCTGGTGCT 
ACCATCTGGA 
CGCGCAGCGC 
CAGCCCTGGA 
ATTCCTCACG 
TGGGCCGTGG 
GGCTGTTCAA 
GCGTCGACGC 
AGTATGCGC7 
ATAGCATCGG 
TGTTCCTGGT 
TGTCGATCGC 
TG7CGATCGC 
TGCATGCGAT 
CGCATGCGTT 
AG7CGGTGAG 
GCACAGACGA 
TCGCGGATGG 
CGAAATCGAC 
TCGCATCGTC 
TCTGGGCCGT 
TGCCGCTGCC 
ACGCGGCGCG 
GCGCGGTGCG 
GACGCCGGGA 
CAGGCGTGCT 
AGG7CGAGAT 
GCATGGTGCG 
GCGCCGGGCG 
TCATCGCCCT 
TGCCTCGGCC 
CGGCATGGTA 
ACGCGGCGAC 
AGGTCCATGC 
GG7A7GTGAG 
GCGAGGGAGT 
ATC7CCTGCG 
ACCAGCTCGG 
GGATGATGCT 
TCGCGGCAGG 
ATGCGTTCCG 
ACCCGGAGGT 
GGGATCTGCT 
CGTTCCTCCG 
AGGCGCTGTT 
TCGAGGCGAG 
CCTTGTTGAC 
CGGCGGAGAA 
ACTGGCAAAT 
TGTCAAGCTG 
CCCGAACCTG 
GAGACTCCCC 
CACAGACATC 
GATCCACGCC 
CTTTCGGAAA 
GCAGGTGATC 
CCGGAGCACA 
TGACACCGAG 
CCCTCTCGTG 


GTCGGCAAGG 
GACCTACCCT 
GATGGAGCAC 
CGCTGCGGCG 
CGGCAAGCTC 
GCTGTATGAT 
CCAGGAGCTC 
CGCGCTGCTC 
CGCCGCGCTG 
TGAGCTGGTG 
GGCTGCGCGC 
GGCGCCGGAG 
CGCGGTCAAC 
CGCGGCGGCG 
CCACTCACCG 
CTACCGGCGG 
GGTGAGCTCG 
AGTGAAGGCG 
GCTGCTCGGC 
GCGCGCTGGG 
CGGTGGCCTG 
CACGTACCCT 
TGGCGACCGC 
CGGCGGCGAC 
GAAGGTCGAG 
CGATCGCCTC 
CGCCGTCGAC 
CGACGACCrG 
CATCGTCGCC 
TTCGGGGGGA 
TCAGGCGCTC 
CGCGCTCGAC 
CGGCGGGGTC 
GACGGCCGGC 
CGATTCCCGC 
AGACGTCGTG 
ATCGCACGGC 
GCTGCGC-CCG 
CGAGCGGCCG 
CGTG7TCACC 
GAGCATGGCG 
CCAGATCCGT 
CGACAGGCTC 
TACGCAGGTC 
CACCCGCCTC 
CCTCAAGCTG 
CCAAAACCTG 
CCTACGGGCA 
CATTGCCCTA 
GCGGCCGATG 
CTCGCTCGAA 
GCAGAGTCCA 
CAAGGATCCT 
TATCGCGAAT 
GTCGTCGCGC 
GAGCCTAAAG 
CGGGAAGCGA 
CGCCCTCCGC 
CTCAGTATCG 


ACCGCGGCAG 
TCGCAGTGTC 
CGGC7CGCGG 
CAGGGACAGA 
GCCTTTCTCT 
GTATGGCCCG 
GACCGGCCGC 
GACCAGACAG 
TGGCGGTCGT 
GCTGCCTGCG 
GGGCGCCTGA 
GCCGATGTGG 
GGTCCGGACC 
ATGGCCGCGC 
CTCATGGCCC 
CCGTCGATCG 
CCGGGCTAT? 
CTGCACGCGG 
CTGGTGCCTG 
CGTGACGAGC 
GTCTCCTGGG 
TGGCAGCGCG 
CGTGCTCCCC 
CGGCGCAGCG 
GCCG7CGGCG 
GTGCTTCGGG 
GCGGCGGGGC 
CCGGGAAAGC 
GTGGGCGAGG 
GCGTTTGCTA 
TCGGCGACCG 
GGAATAGCCC 
GGTCTCGCCG 
ACGCCCGAGA 
TCGGACCGGT 
CTCAACTCGC 
CGGTTTG7GG 
TTCCTGCGCA 
GCGCGGGTCC 
CCTCCGCCCA 
CAGGCGCAGC 
ATTCCGACCC 
GCGTCAGCTG 
TCGCAGGTGC 
GGGATGGACT 
AAGC7CTCGA 
TTGGATGCTC 
GGCGTGCAAA 
TGACGATCAA 
GGGAGCGCCT 
TCTCCGAGCA 
TCGTGCCCGC 
ACTGGCTGGG 
ACGACTGTAC 
GGCACGACAT 
TCGACGCCGA 
GGCTCGTATC 
TCTATCACGT 
ATCTCATTAA 


CCTTGGATGC 
TGGGCGATGT 
TCGCGGCGAC 
CGCCGCCCGG 
TCACCGGACA 
CGTTCCGCGA 
TCCGCGAGGT 
CCTTTACCCA 
GGGGCGTAGA 
TGGCGGGCGT 
TGCAGGCGCT 
CTGCTGCGGT 
AGGTGGTCAr 
GCGGGGCGCG 
CGATGCTGGA 
TCCTGGTCAG 
GGGTGCGCCA 
CCGGTGCGGG 
CCTGCCTGCC 
CAGCGACCGT 
CCGGCCTCTT 
AGCGCTACTG 
GAGCGGGTCA 
CTCGGCTCGA 
ACCGTCCGTT 
TCACGGAGCG 
TCAGCTTCAA 
CCAACCCTCC 
GCGTGAACGG 
CCCACGTCAC 
AGGCGGCCGC 
GCCTTCAGCC 
CGC7GCAGTG 
AGCGCGCCTA 
TCGTCGCCGA 
TTTCGGGCGA 
AGCTCGGCAA 
ATCTCTCCTT 
GTGCGCTCTT 
7CGCGACGCT 
ATCTTGGGAA 
ACGCAGGCGG 
CGCCGGCCCC 
TGCGCACGCC 
CGCTCATGGC 
CGACGTTCCT 
TCGCCACAGC 
GCGACTTCGT 
TCAGCTTCTG 
CCAGATACAG 
CAAAAGCACG 
CCCAGCCGAG 
TCGGACAGGA 
GGATCTCGAC 
GCTTCGGGCC 
CATCGAGATC 
GTTGCGAGAT 
CGTCGCCGTT 
CGTTGACCTA 


ACACGCGGCG 
GGCGTTCAGT 
GTCGAGCGAG 
TGTGGTGCGC 
GGGGGCGCAG 
GGCGTTCGAC 
GATGTGGGCC 
GCCGGCGCTG 
GCCGGAGTTG 
CTTCTCGCTT 
GCCC-GCCGGC 
GGCGCCGCAC 
CGCGGGCGCC 
AAC7AAGGCG 
GGCGTTCGGG 
CAATCTGAGC 
CGCGCGAGAC 
CACCTTCGTC 
GGACGCCCGG 
GC7CGAGGCG 
CCCCTCAGGG 
GATCGACACG 
CGACGAGGTC 
CCATCCGCCC 
CCGGCTCGAG 
GCGCGCCCCT 
TGATGTCCAG 
GCTGCTGCTC 
CCTTGTGGTG 
CACGTCCGCT 
CATGCCCGTC 
GGGGGAGCGG 
GGCGCAGCAC 
C7TGGAGTCG 
CGTGCGCGCG 
GCTGATCGAC 
GCGCGACTGT 
CTCGCTGGTG 
CGAGGAGCTC 
CCCGATCGCT 
GCTCGTACTC 
CGGCCCGTC7 
GCGCGCGGCG 
CGAAATCAAG 
CGTGGAGCTG 
GTCCACGTCC 
TCTCTCCTTG 
CTCA7CGGGC 
AACGAGCTCG 
GCCCCCAAGA 
ATCCTGACGA 
CGGCACGTTC 
GCGTTTACGG 
GTGGCGAGGC 
CACACGCTGC 
ATCGATCTGC 
GCGATGTCGC 
CGGCTGGACG 
GGCAGCCl'GT 


CGGC TGCGCG 
CTGGCGACGA 
GGGCTGCGGG 
GGTATCGCCG 
ACGCTGGGCA 
CTGTGCGTGA 
GAACCGGCCA 
TTCACCTTCG 
GTCGCTGGCC 
GAGGACGCGG 
GGGGCGATGG 
GCAGCGTCGG 
GGGCAACCCG 
CTCCACGTCT 
CGTGTGGCCG 
GGGAAGGCTG 
GTGGTGCGCT 
GAGGTCGGTC 
CCGGCGCTGC 
CTCGGCGGGC 
GGGCGGCGGG 
AAAGCCGACG 
GAGAAGGGGG 
CCCGAGAGCG 
ATCGATGAGC 
GGTCTTGGCG 
CTCGCGCTGG 
GGAGGCGAST 
GGCCAACCGG 
GCGCTGGTGC 
GCGTACCTGA 
GTGCTGATCC 
GTGGGAGCCG 
CTGGGCGTGC 
TGGACGGGCG 
AAGAGTTTCA 
TACGCGGATA 
GATCTCCGGG 
CTCGGCCTGA 
CGTGTCGCCG 
ACGCTGGGTG 
ACCGGGGATC 
GCGCTGGAGG 
GTCGGCGCGG 
CGC AATCGTA 
CCCAATATCG 
GAGCGGGTCG 
GCAGATCAAG 
AGCACCAGGG 
ACGCCCTGAA 
TGCTCCGTCA 
CGTTTCCTCT 
TCCCCAGCGG 
TGAGCCGCGC 
CCGACAtGAT 
GCGGGCTCGA 
ACCGCATCTA 
AGCAGCAAAC 
CCATCATCTT 
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6901 CAAGGATTGG CTCAGCTTCT ACGAAGATCC CGAGACCTCT CTCCCTGTCC 7GGAGCTCTC 
6961 GTACCGCGAC TATGTGCTCG CGCTGGAGTC TCGCAAGAA3 TCTGAGGCG3 ATCAACGAT*"' 
7021 GATGGATTAC TGGAAGCGGC GCGTCGCCGA CCTCCCACCT CCGCCGATGC TTCCGATGAA 
7C01 GGCCGATCCA TCTACCCTGA GGGAGATCCG CTTCCGGCA3 ACGGAGCAAT GGCTGCCGTC 
7141 GGACTCCTGG AGTCGATTGA AGCAGCGTGT CGGGGAGCGC GGGCTGACCC CGACGGGCGT 
7201 CATTCTCCCT GCATTTTCCG AGGTGATCGG GCGC7GGAGC GCGAGCCCCC GGTTTACGCT 
7261 CAACATAACG CTCTTCAACC GGCTCCCCGT CCATCCGCGC GTGAACGATA TCACCGGGGA 
7321 CTTCACGTCG ATGGTCCTCC TGGACATCGA CACCACTCGC GACAAGAGCT TCGAACAGCG 
7381 CGCTAAGCGT ATTCAAGAGC AGCTGTGGGA AGCGATGGAT CACTGCGACG TAAGCGGTAT 
74 41 CGAGGTCCAG CGAGAGGCCG CCCGGGTCCT * GGGGATCCAA CGAGGCCCAT TGTTCCCCGT 
7501 GGTGCTCACG AGCGCGCTCA ACCAGCAAGT CGTTGGTGTC ACCTCGCTGC AGAGGCTCGG 
7561 CACTCCGGTG TACACCAGCA CGCAGACTCC TCAGCTGCTG CTGGATCATC AGCTCTACGA 
/621 GCACGATGGG GACCTCGTCC TCGCGTCGGA CATCGTCGAC GGAGTGTTCC CGCCCGACCT 
7681 TCTGGACGAC ATGCTCGAAG CGTACGTCGC TTTTCTGCGG CGG CTCACTG AGGAACGATG 
7741 GAGTGAACAG ATGCGCTGTT CGCTTCCGCC TGCCCAGCTA GAAGCGCGGG CGAGCGCAAA 
7801 CGAGACCAAC TCGCTGCTGA GCGAGCATAC GCTGCACGGC CTGTTC3CGG CGCGGGTCGA 
7861 GCAGC7GCCT ATGCAGCTCG CCGTGGTGTC GGCGCGCAAG ACGCTCACGT ACGAAGAGCT 
7921 TTCGCGCCGT TCGCGGCGAC TTGGCGCGCG GCTGCGCGAG CAGGGGGCAC GCCCGAACAC 
7981 ATTGGTCGCG GTGGTGATGG AGAAAGGCTG GGAGCAGGTT GTCGCGGTTC TCGCGGTGCT 
8041 CGAGTCAGGC GCGGCCTACG TGCCGATCGA TGCCGACCTA CCGGCG3AGC GTATCCACTA 
8101 CCTCCTCGAT CATGGTGAGG TAAAGCTCGT GCTGACGCAG CCATGGCTGG ATGGCAAACT 
8161 GTCATGGCCG CCGGGGATCC AGCGGCTGCT CGTGAGCGAT GCCGGCGTCG AAGGCGACGG 
8221 CGACCAGCTT CCGATGATGC CCATTCAGAC ACCTTCGGAT C7CGCGTATG TCATCTACAC 
8261 CTCGGGATCC ACAGGGTTGC CCAAGGGGGT GATGATCGAT CATCGGGGTG CCGTCAACAC 
8341 CA.CCTGGAC ATCAACGAGC GCTTCGAAAT A.GGGCCCGGA GACAGAGTGC TGGCGCTCTC 
8401 CTCGCTCAGC TTCGATCTCT CGGTCTACGA TGTGTTCGGG ATCCTGGCGG CGGGCGGTAC 
8461 GATCGTGGTG CCGGACGCGT CCAAGCTGCG CGATCCGGCG CATTGGGCAG CGTTGATCGA 
8521 ACGAGAGAAG GTGACGGTGT GGAACTCGGT GCCGGCGCTG ATGCGGATGC TCGTCGAGCA 
8581 TTCCGAGGGT CGCCCCGATT CGCTCGCTAG GTCTCTGCGG CTTTCGCTGC TGAGCGGCGA 
8641 v.TGGATCCCG GTGGGCCTGC CTGGCGASC? CCAGGCCATC AGGCCCGGCG TCTCGGTGAT 
8701 CAGCCTGGGC GGGGCCACCG AAGCGTCGA7 CTGGTCCATC GGGTACCCCG TGAGGAACGT 
8761 CGATCCATCG TGGGCGAGCA TCCCCTACGG CCGTCCGCTG CGCAACCAGA CGTTCCACGT 
8821 GCTCGA7GAG GCGCTCGAAC CGCGCCCGGT CTCGGTTCCG GGGCAACTCT ACATTGGCGG 
8881 GGTCGGACTG GCACTGGGCT AC7GGCGCGA TGAAGAGAAG ACGCGCAACA GCTTCCTCGT 
8941 GCACCCCGAG ACCGGGGAGC GCCTCTACAA GACCGGCGAT CTGGGCCGCT ACCTGCCCGA 
9001 TGGAAACATC GAGTTCATCG GGCGGGAGGA CAACCAAATC AAGCTTCGCG GATACCGCGT 
9061 TGAGCTCGGG GAAATCGAGG AAACGCTCAA GTCGCATCCG AACGTACGCG ACGCGGTGAT 

9121 TGTGCCCGTC GGGAACGACG CGGCGAACAA GCTCCTTCTA GCCTATGTGG TCCCGGAAGG 

9181 CACACGGAGA CGCGCTGCCG AGCAGGACGC GAGCCTCAAG ACCGAGC3GG TCGACGC3AG 

9241 AGCAGACGCC GCCAAAGCGG ACGGATTGAG CGACGGCGAG AGGGTGCAG7 TCAAGCTCGC 

9301 TCGACACGGA CTCCGGAGGG ATCTGGACGG AAAGCCCGTC GTCGATCTGA CCGGGCTGGT 

9361 TCCGCGGGAG GCGGGGCTGG ACGTCTACGC GCGTCGCCGT AGCGTCCGAA CGTTCCTCGA 

9421 GGCCCCGATT CCATTTGTTC AATTCGGCCG ATTCCTGAGC TGCCTGAGCA GCGTGGAGCC 

9481 CGACGGCGCG GCCCTTCCCA AATTCCGTTA TCCATCGGCT GGCAGCAGGT ACCCGGTGCA 

9541 AACCTACGCG TACGCCAAAT CCGGCCGCAT CGAGGGCGTG GACGAGGGCT TCTATTATTA 

9601 CCACCCGTTC GAGCACCGTT TGCTGAAGGT CTCCGATCAC GGGATCGAGC GCGGAGCGCA 

9661 CGTTCCGCAA AACTTCGA7G TGTTCGATGA AGCGGCGTTC GGCCTCCTGT TCGTGGGCAG 

9721 GATCGATG7C ATCGAGTCGC TGTATGGATC GTTGTCACGA GAATTCTGCC TGCTGGAGGC 

9731 CGGATATATG GCGCAGCTCC TGATGGAGCA GGCGCCTTCC TGCAACATCG GCGTCTGTCC 

9841 GGTGGGTCAA TTCGATTTTG AACAGGTTCG GCCGGTTCTC GACCTGCGGC ATTCGGACGT 

9901 TTACGTGCAC GGCATGCTGG GCGGGCGGGT AGACCCGCGG CAGTTCCAGG TCTGTACGCT 

9961 CGG7CAGGAT TCCTCACCGA GGCGCGCCAC GACGCGCGGC GCCCCTCCCG GCCGCGATCA 

10021 GCACTTCGCC GATATCCTTC GCGACTTCTT GAGGACCAAA CTACCCGAGT ACATGG7GCC 

10081 TACAGTCTTC GTGGAGCTCG ATGCGTTGCC GCTGACGTCC AACGGGAAGG TCGATCGTAA 

GGCCCTGCGC ^AGCGGAAGG ATACCTCGTC GCCGCGGCAT TCGGGGCACA CGGCGCCACG 
10201 CCACGCCTTG GAGGAGATCC TCGTTGCGGT CGTACGGGAG GTGCTCGGGC TGGA3GTGGT 
102 ol TGGGCTCCAG GAGAGCTTCG TCGATCTT3G TGCGACATCG ATTCACATCG TTCGCATCAG 
10321 GAGTCTGTTG CAGAAGAGGC TGGATAGG3A GATCGCCATC ACCGAGTTGT TCCAGTACCC 
10301 GAACCTCGGC TCGCTGGCGT CCGGTTTGCG CCGAGACTCG AAAGATCTAG AGCAGCGGCC 


55 


WO 00/31247 


- 74 - 


PCT/US99/27438 


10 


10 


15 


15 


20 


20 


25 


25 


30 


30 


35 


35 


40 


45 


40 


50 


45 


55 


50 


10441 
10501 
10561 
10621 
10681 
10741 
10801 
10861 
10921 
10981 
11041 
11101 
11161 
11221 
11281 
11341 
11401 
11461 
11521 
11531 
11641 
11701 
117 61 
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GAACATGCAG 
AACAAAACCA 
ATCTGATCTG 
CTGAGGAACG 
TCGGGCCGTT 
ACGGAGGCCG 
CTGGTGCTGG 
GACGCTGCTT 
ATCTrCATGG 
GAGGGCTCTA 
CACGAGCACC 
AAG GATT ACC 
GTTCAAACTG 
GACCGCGAGT 
GGCTATGTAT 
GCCAAGGCGA 
GACCGGGCGC 
AACGACGGAG 
ATCATGGAGG 
CACGGGACCG 
GATCGCGACG 
CACCTCGAAT 
CGGCAGC7GC 
AGCCCGT7CT 
GCCGGCGTCA 
CCCGCGGCGA 
GCCAAGAGCG 
CACCAGGGGC 
GAGCACCGGC 
GCGGCGCGAG 
CCGAAGGTGG 
C7CCTGGCTG 
GCCGAAGCTG 
GAGCGCATCG 
TGGCGGTCGT 
GCCGCGCATG 
AGCCGGCTGC 
GCCGAGGCCG 
AGCCCGCGCT 
CTGAACGCGA 
CAG3TCGACC 
GCTGCGGTGC 
GCGAATTACT 
CAGCTCCAAG 
TCGGTCGAGG 
CGAGGCCAGG 
TACCCTGTAC 
TA7CCCTGGC 
CGCCGCGGCG 
CAGACGAGCA 
GACCACCGGG 
TCGTCGGGGG 
GAGGCGCTGG 
CCGTCGGGAC 
TTCCGGGTCC 
CTTACGCTTT 
GCGGAGCTGA 
TGGCGCGGTG 
GCGGAGTATC 


C-ACCGAGTGG 

GGCCGAGCGG 

ATCCCGGGTA 

GTGAGCT3AT 

TTCCGGG3GC 

TGCAGCGCTT 

ACCCGAACTA 

TCTTCGGCAT 

AATGCGCCTG 

TCGGCGTGTA 

CAGCGA7GAT 

TCGCGACCCA 

OCTGCTCTAC 

GCGACATGGC 

ATGCTGAGGG 

ACGGCACGAT 

TCTCCGATGG 

CGAGGAAGAT 

C3CTGGCGCT 

GCACGCTGCT 

CTTCGACCCG 

CGGCGGCTGG 

CCCCCAGCCT 

ACGTCAATAC 

GCTCGTTCGG 

AGCTTCCAGC 


CAGCGGCGCT 


TTTCGTTGGG 

TCGCGATGGC 

GCCAGACCCC 

TCTTCGTCTT 

AGGAACCCGT 

GTPGGTCGCT 

ACGTGGTGCA 

GGGGTGTCGG 

TGGCCGGGGC 

TCCGGCGCAT 

AGGCAGCGCT 

CGACGGTGCT 

AGGGGGTG'l'T 

CGCTGCGCGA 

CGATGCGCTC 

GGATGAACAA 

GCGGCCACGG 

AGATGCGGCG 

ACGAGCGCCC 

CCTGGGGGCG 

AGCGCGAGCG 

TGCGTGCGGG 

CGCGGCTGTG 

TGCAGGGAGC 

CCGAGGCTTT 

CCTTCCCGGG 

GGCTGCAGTT 

ACGCTCGCGG 

CCGCCGTGCG 

CCGAGATGGG 

AGGGCGAGGC 

GGTTGCATCC 


AGGCTCGGCG 
GCCAATGAAC 
CGCGTCGCGG 
GGAAGAACAA 
GCGGGATCTG 
CTCCGAGCAG 
CGTCCGGGCC 
CAGCCCGCGC 
GGAGGCGCTG 
CGCCGGCCCC 
GCGGTGGCCC 
CGTCTCCTAC 
CTCGCTCGTG 
GCTGGCCGGC 
GGGCATCTTC 
CATGGGCAAC 
TGATCCCGTC 
CGGGTTCACT 
GGCAGGGGTC 
CGGACACGCC 
GAGGTCTTGC 
CATCGCCGGT 
GAACTTCGAG 
CTCTCTTAAG 
GATCGGCGGC 
CGCGGCGCCG 
GGA'iGCCGCG 
CGACGTCGCC 
GGCACCGTCG 
GCCGGGCGCC 
TCCCGGCCAG 
CTTCCACGCG 
GCTCGCCGAG 
GCCGGTGCTG 
GCCCGACGTC 
GCTGTCGCTC 
CAGCGGTCAG 
CCGAGGCTAC 
CTCGGGCGAG 
CTGCCGTCGG 
GGACCTCTTG 
GACGGTGACG 
TCTCAGGCAG 
TCTGTTCGTG 
CGCGGCCCAG 
GGCGATGCTG 
GCTGTTTCCC 
GTACTGGATC 
CGGTCACCCG 
GGAGACGACG 
GGTCGTGTTT 
GGGCGAT3GC 
CGACGCGGCG 
CCAGATCGCG 
CGCGTTGCTC 
CGCACGGCTC 
GC7GCAGTAC 
GCTGGGACGG 
TGCGCTGCTG 


CAA3GGCAGG 
CGCAAGCCCG 
GTGTGCGCGT 
GAGTCCTCCG 
GACGAATTCT 
GAGCTCGCGG 
GGCAGCGTGC 
GAGGCAGAGC 
GAGAACGCCG 
AACATGAGCT 
GGCTGGTTTC 
AGGCTGAATC 
GCGGTTCACT 
GGGATTACCG 
TCTCCCGACG 
GGCTGCGGGG 
CGCGCGGTCA 
GCGCCCAGTG 
GAGGCCCGGT 
ATCGAGACGG 
GGGATCGGCT 
TTGATCAAGA 
TCTCCTAACC 
GATTGGAATA 
ACCAACGCCC 
GCGCGCTCTG 
GCGGCACGGC 
TTCAGCCTGG 
CGCGAGGCGT 
GTCCGTGGCC 
GGCTCTCAGT 
GCGCTTTCGG 
CTCGCCGCCG 
TTCGCGCTCG 
GTGATCGGCC 
GAGGATGCGG 
GGCGAGATGG 
GAGGATCGGG 
CCGGCASCGA 
GTGAAC3TGG 
GCAGCGCTGG 
GGCGCCATGG 
CCTGTGCGCT 
GAGATGAGCC 
CGGGCGGGCG 
GAGGCGCTGG 
GCGGGGGGGC 
GAAGCGCCGG 
CTCCTCGGTG 
CTCGATCTCA 
CCGGGCGCGG 
CCATTGCAGA 
GTGTTGGTCC 
AGCCGGGCGC 
CGAGTGGAGC 
CAGGCCAGCA 
3GCCCTGCC7 
GTACGCCTGC 
GACGCGTGCT 


AGACGTAGCT 
CCTGCGTCAC 
TGAGCCGTGT 
CTATCGCAGT 
GGA3GAACCT 
CGTCCGGAGT 
TCGAAGATG? 
TCATGGATCC 
GATACGACCC 
CGTACTTGAC 
AGACGTTGAT 
TGAGAGGGCC 
TGGCGTGCAT 
TCCGGATCCC 
GCCATTGCCG 
TTGTCCTCCT 
TCCTTGGGTC 
AGGT3GGCCA 
CCATCCAATA 
CGGCGTTGCG 
CCGTGAAGAC 
CGGTCTTGGC 
CATC GAT CGA 
CCGGCTCGAC 
ATG7CGTGCT 
CCGAGC'J'CTT 
TACGAGATCA 
CGACGACGCG 
TGCGAGAGGG 
GCTGCTCCCC 
GGGTCGGTAT 
CGTGCGACC3 
ACGAAGGGTC 
CGGTGGCATT 
ACAGCATGG-3 
TGGCGATCAT 
CGGTGACCGA 
TGAGCGTGGC 
TCGGCGAGGT 
ATGTCGCCAG 
GCGGGCTCCG 
TAGCGGGCCC 
TCGCCGAGGT 
CGCATCCGAT 
CAGCGGTGGG 
GCGCGCTGTG 
GGCGCCTACC 
CCAAGAGCGC 
AAATGCAGAC 
AGC3GCTGCC 
CGTACCTGGA 
TAACCGACGT 
AGGTGGTGAC 
CGGGCGCTCC 
GCACCGAGGT 
TGCCCGCCGC 
TCCAGGGGAT 
CCGACGCGGC 
TCCAGGTCGT 


AAGAGCGCCG 
CCT3GGACTC 
TGCTCGAACG 
CATCGGCATG 
TCGAGACGGC 
CGACCCAGCG 
CGACCGGTTC 
GCAGCACCGC 
GACAGCCTAC 
GTCGAACCTC 
CGGCAACGAC 
GAGCATCTCC 
GAGCCTCCTG 
CCATCGAGCC 
GGCCTTCGAC 
GAAGCCGCTG 
TGCCACAAAC 
GGCGCAAGCG 
CAT CG AG ACC 
GCGGGTGTTC 
CGGCATCG3A 
GCTGGAGCAC 
TTTCGCGAGC 
TCCGCGGCGG 
GGAGGAAGCA 
CGTCGTCTCG 
TCTGCAGGCG 
CAGTCCCATG 
GCTCGACGCA 
AGGCAACGTG 
GGGCCGTCAG 
GGCCATCCAG 
GTCCCAGATC 
TGCGGCGCTC 
CGAGGTAGCC 
CTGCCGGCGC 
GCTGTCGCT3 
CGTGAGCAAC 
GCTGTCGTCC 
CCACAGCCCG 
GCCGCGTGCG 
GGAGCTCGGA 
AGTCCAGGCG 
CCTAACCACT 
CTCGCTGCGG 
GGCGCAGGGC 
GCTGCCGACC 
CGCGGGCGAT 
CCTATCAACC 
GTGGCTCGGC 
GATGGCGATT 
GGTGCTCGCC 
GACGGAGCAG 
CCACGCGTCC 
CCCGGCTGGG 
GGCCACCTAC 
TGCTGAGCTA 
CGGCTCG3CA 
CGGCAGCCTC 
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0 (1)Me0H,H x 

r A 0H (2)Bu 2 A1H ** 

|(P s b0) 2 P0N3 , Et 3 N 

(1)|P h 3 

0 2^"C0 2 Et 0 

R H (2)NaOH R'^T'OH 

2 2=H,Me,Et 

(Pb0) 2 P0N 3l Et 3 N I 
RSH J 

0 

A* 



0 

R'^J'SR 

2 


■6 

X — t 

X y 

X Ki 


x=ch 2 .o,s 

y=ch 2 ,o.s 

X=H 1 Me,Et,CH 2 OH,Br 

Y=0,S 

X=H.Me.Et.Br,OH 

Y=NH.0.S 


a;x" 

X=CH,N 

Y=CH.N 


w 

F 


O ' 


•X 


X=N02,CN,Me ( 0-alkyl,haIo ( etc. 

Y=CH t N 


XT' 

X=N02,CN' 1 alkyl,ary! ( halo,0-Qikyl,etc. 

Y=CH,N 



X-NO 2 , ON, alkyl .aryl.halo, O-alkyl, etc. 
Y=CH,N 







X=CH 2l 0 1 S.NH.N-all<yl 1 N-aryl X=CH 2 ,0,S,NH,N-alkyl.N-afyl 

Y=CH 2 ,O.S,NH,N-a!lqrI 1 N-aiyl Y=CH 2 ,0,S,NH,N-alfcyl t N-aryl 
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13981 

14041 

14101 

14161 

14221 

14281 

14341 

14401 

14451 

14521 

14581 

14641 

14701 

14761 

14821 

14881 

14941 

15001 

15061 

15121 

15181 

15241 

15301 

15361 

15421 

15481 

15541 

15601 

15661 

15721 

15781 

15041 

15901 

15961 

16021 

160E1 

16141 

16201 

16261 

16321 

16381 

16441 

16501 

16561 

16621 

16681 

16741 

16801 

16861 

16921 

16931 

17041 

17101 

17161 

17221 

17281 

17341 

17401 

17461 


TTCGCCGGCG 
TTGCAGCGGC 
ACCCCCGATC 
GAAGTCAGCG 
GATTGGTTCC 
GGCCGGTGGC 
GAGGCCGGCG 
CGCGCGCTCC 
AGCCTCGATG 
CCCCGGAGCG 
AGCGTGC7CT 
TGGCTTC7GA 
CCGCTGCTGG 
GTCGACCTCG 
GCCGACGACG 
GTCCCCCGGC 
GTCACCATCC 
AGCGTGGCCG 
GGCGCGGCGA 
GTCACCGTGG 
GTTACCACGT 
GGGCTGCTGA 
GGGGCCTTGC 
GCTTCGGGAG 
TTCCTCGACG 
TGGGGCCTGT 
GTCTCCCGCG 
CYCGAAAGCG 
CTCTACCCCG 
AGC3CCGGCG 
AGCGCGCGGA 
CTCCCCGAGG 
ATGGGGCTCG 
CTGTTGTGGA 
GAAGCCGCTC 
ATGTCGCAGG 
CTCGCGGTCC 
TGGAGGAGCG 
TCCTCGGTAT 
TGCTCGACGC 
TCGCTCCCGT 
TCGATGCTGC 
GTCTGTTGCT 
TCGACGGGAG 
TCGCTCGGTT 
TCGCCGCCGG 
CGGCGTGCTC 
AGAGCGATCT 
CCGCGGCGCG 
CCAACGGGTT 
CGCAACGGGA 
GCCGGTCGAC 
CGCTGCGGAG 
GGACCTCGCT 
GCTCCGACGG 
CCGCGGCAGG 
CGAGAAACCT 
CGTTGGCGAC 
GCTCGTTCGG 


GTGGCGAGGC 
CTTCGGGGGA 
GGCAGGGCGC 
GGCTCGTGGC 
TGGAGCTCGA 
TCCTCCTCGG 
GCCATGCCGT 
TGGCAAAGGC 
GGGGTGGCGA 
CCGACGTCAG 
GCACCGTGCA 
CCCGCGGCGC 
GGCTGGGCCG 
ATCCGACCCG 
CCGAAGCGGA 
AGCCCGAGAC 
GCGCGGACAG 
GATGGCTGGC 
GCGTGGAGCA 
CGAAGGCAGA 
CGGGGATGCC 
TGCAGCAGAC 
ACCTGCACGC 
TAGGGCTCTT 
C7CTGGCGCA 
TCGCGGAGGT 
GAATGCGGAG 
GCCGCGTGCA 
CGGCGGCGTC 
GGCCAGCCGG 
GCGGGCTCCT 
GCAAGATCGA 
AGCTGCGCAA 
CCTATCCCAC 
CTGTGGAGTC 
ACGATCTGAC 
TACGGCACAG 
GCTCGCTGGG 
CGGCTGCCGC 
GGAGCGCGAC 
CGAGGCCGTG 
GTTCTTCGGC 
GGAGGTCGCT 
CCGCACCGGT 
GCCGCGCGAG 
ACGGCTGTCG 
GTCATCGCTG 
CGCGTTGGCG 
CACGCAAGCG 
C.GTCCGTGGC 
TGGCGACCGC 
CGGGTTGACC 
CGCCCACGTC 
GGGCGATCCC 
CACACGCTGC 
CGTAGCGGGC 
CAACTTCCGC 
CGAGCCGGTG 
GATGAGCGGA 


GACGCCGTC-G 
GCTGTGGTGC 
CGACTTTTGG 
GCAGCGGCTT 
GTGGGAACCC 
CGGCGGCGGT 
C.GTCCATGCG 
CTTTGACGGC 
GCTCGACCCA 
TCCCGATGCC 
GGCCCTGGCC 
ACAGGCCG'J'C 
CGTCATCGCC 
GCCCGATGGG 
AGTCGCGTTG 
CCGGCCCCGG 
CACCTACCTT 
CGAGCGCGGC 
ACGGGCAGCC 
TGTCGCCGAT 
GCTGCGGGGC 
TCCCGCGCGG 
GTTGACGCGC 
GG3C7CGCCG 
CCACCGGAGG 
GGGCATGGCG 
CCTCACCCCC 
GGTGGGGG7G 
T7CGCGAA7G 
GGACGGGGAC 
GGAGCCGCTC 
GG7GGACGCC 
CCGCA7CGAG 
GGTGGCGGCG 
ACCGCACACC 
GCAG77GATC 
CAGAATCCGC 
C7CGCACAGG 
TTCCC7GGCG 
GCGGTCCAGC 
CCGCACTGGG 
ATC7CGCCTC 
TGGGAGGGGC 
GTGTTCGTCG 
GAGCGAGACG 
TACACGC7GG 
GTGGCGATTC 
GGAGGGGTCA 
CTGTCGCCCG 
GAGGGC7G7G 
A7CTGGGCGC 
GCGCCCAACG 
GAAGC7GCCG 
A7CGAGG7CG 
G7GCTGGGCG 
CTGATCAAGG 
ACGCTCAATC 
CCG7GGCCGC 
ACGAACGC3C 


GTGCCCGTGG 
CATGCGCGCG 
GTGGTCGACA 
CCGGGAGGGG 
GCAGCGGTCG 
GGGC7CGGCG 
GCAGAGAGCA 
CAGGCTCCGA 
GGGCTCGGGG 
CTCGATCCGC 
GGCATGGGCT 
GGCGCCGGCG 
ATGGAGCACG 
GAGC7CGGTG 
CGCGGTGGCG 
GGGAGGATCG 
GTGACCGGCG 
GCTGGTCACC 
GTCGCGGCGC 
CGGCCGCAGC 
G7CGTCCA7G 
TTTCGTAAGG 
GAAGCGCCGC 
GGCCAGGGCA 
GCGCAGGGGC 
GCCGCGCAGG 
GACGAGGGGC 
ATGCCGGTGA 
TTG7CGCGCC 
C7GCTC CGCC 
CTCCGCGCGC 
CCGCTCACGA 
CCCATGCTGG 
C7GAGCGGGC 
ACCGCCGATT 
GCAGCAAAAT 
TGAAACAAGC 
CGGAGCTGGA 
GTGCGGACGC 
CGCTCGACAG 
CGGGGCTGC7 
GGGAGGCGCG 
TCGACGACGC 
GCGCTTTCAC 
CG7ACAGCGC 
GGCTGCAGGG 
ACCTCGCCTG 
GCACGC7CCT 
ATGGTCGTTG 
GCC7GG7CGT 
7GATCCGGGG 
TGCToGCTCA 
CCG7CGA7TA 
AGGCGCTGCG 
CGG7GAAGAC 
CAGCGCTTTC 
CGCGGATCCG 
GCACGGACCG 
ATG7GGTGCT 


AAG7GGGCTC 
TCGTGAACCA 
GCTCGGG7GC 
TGCGCCGGCG 
GCACAGCCAA 
CCGCGT7GCG 
ACACGAGCGC 
CGGCGGTGG7 
CGCAAGGCGC 
CGCTGGTACG 
TTCGAGACGC 
ACGTCTCCGT 
CGGATCTGCG 
CCCTGCTGGC 
AGCGATGCG7 
AGAGCTGCGT 
GTCTGGGTGC 
7GGTGC7GGT 
TCGAGGCCCG 
TCGAGCGGA7 
CGGCCGGCAT 
TGA7GGCGCC 
T7TCCTTCTT 
ACTACGCCGC 
TGCCAGCG77 
AAGA7CGCGG 
TG7CCGCTCT 
ACCCGCGGCT 
TGGTGACGGC 
GCC7CCC7GC 
AGATCTCGCA 
GCCTGGGCAT 
GCATCACCGT 
ATCTGGCGCG 
C7GCTG7CGA 
TCAAGGCGCT 
GG CCA 7 CATC 
ACG3ACCGAG 
TCCGGAAGCG 
GCGCTGGGCG 
CACCGAGCCG 
ATCGCTCGAC 
CGGTATCCCG 
GGCGGACTAC 
CACCGGCAAC 
ACCTTGCCTG 
CCGCAGCCTG 
CTCCCCCGAC 
CCGGACCTTC 
CCTCAAACGG 
CTCGGCCATC 
GGAGACGGTC 
CGTCGAGACC 
GGCGACGGTG 
CAACATCGGC 
GCTGACGCAC 
GCTCGAGGGC 
TCCGCGCTTC 
GGAAGAGGCG 


GCTGCGGCTC 
CGGGCGCCAA 
AGTGGTCGCC 
CGAAGAAGAC 
GGTCAACGCG 
CTCGATGCTG 
TGCCCGCGTA 
GCACCTCGGC 
ATTGGACGCG 
TGGCTGTGAC 
CCCGCGATTG 
GACACAGGCA 
CTGCGCTCGG 
CGAGCTGCTG 
CGCTCGGATC 
TCCGACCGAC 
GCTCGGTCTG 
GGGCCGCTCC 
CGGCGCGCGC 
CCTCCGCGAG 
CTTGGACGAC 
CAAGGTCCAG 
CGTGCTGTAC 
GGCCAACACG 
GAGCGTCGAC 
CGCGCGGCTG 
GGCACGGCTG 
GTGGGTGGAG 
GCATCGCGCG 
TGCCGAGCCG 
GGTGCTGCGC 
GAACTCGCTG 
ACCC-GCAACG 
GGAGGCATGC 
GATCGAGGAG 
TACATGACTA 
ATTCAGCGCC 
CCGATCGCCA 
TTTTGGGAGC 
CTGGTAGGTG 
ATAGATTGC? 
CCGCAGCATC 
CCCCGGTCCA 
GCGCGCACGG 
ATGCTCAGCA 
ACCGTCGACA 
CGCGCAGGAG 
ATGATGGAAG 
GATGCTTCGG 
CTCTCCGACG 
AACCATGATG 
TTGCGCGAGG 
CACGGAACAG 
GGGCCGGCGC 
CATCTCGAGG 
GAGCGCATCC 
ACCGCGCTCG 
GCGGGGGTGA 
CCGGCGGTGG 
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18361 
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AGCTGTGGCC 
AGGGGGCGC? 
TCGGGCTCGG 
TCGCGGTCCC 
GGCAGACGCC 
TGCTGTTCAC 
GGCCAGCGTT 
GCCCGCTGCG 
AGACGGCGTT 
GGTC3TGGGG 
CGTGCGTGGC 
GGCTGATGCA 
AGGTGGCCGC 
CGGAGCAGGT 
CGGCGCGCGG 
TGGAACCGAT 
GCG7TTCGCT 
GCTACTGGGT 
ACGAAGCCGG 
TGCCAGCTTG 
AGGAGGCTGC 
GCTGGCCGGG 
AGCGGCAGCC- 
CGCTCCCCCA 
CGCGGCGAGC 
CGGCCGCGGC 
AGGCCTCCGC 
GGGTGCTGTA 
TCGGCAAAGr 
CGGGGCCGCG 
AGCCTGACGC 
AGCATCCCGG 
TCGAGGCCCT 
AGGGGCGCCG 
TGTCGCTGTC 
TCGTTGCGCG 
GATTGCCCGA 
CGGCGATCGA 
CCGATGCCGA 
TGCACGCCGC 
CCCGGGTGTT 
AGCCGCTGGA 
AGGGCAGCTA 
AGGGGCTCGC 
AGGCGCAGCG 
TGGCGGGGAT 
ATTGGGCCCA 
TGGTAACTGT 
ACGCGTCTCT 
GGGTGATGGG 
GCCTCGACTC 
CGCTGTCGGC 
TGAGCCASGC 
CAGAGGACCC 
TGGAGTCCTA 
ACCGGTGGAA 
ACGTGCCCAG 
TCTCGCCTCG 
GGGACCCGAT 


TGCCGCGCCG 
CGACGCGCAG 
GGACGTGGCG 
GGTGACGTCG 
GGCGGGGGCG 
CGGACAGGGC 
CCGGGAGGCG 
CGAGGTGATG 
CACCCAGCCC 
CGTAGAGCCG 
GGGGGTGTTC 
GGGGCTCTCG 
GGCGGTGGCG 
GGTGATCGCG 
CGTGCGCACC 
GCTGGAGGAC 
GGTGAGCAAC 
GCGGCACGTG 
C3CGGGCACG 
CGTGGCGGAG 
GGGGGTGCTC 
CGTCTTCCCC 
GTACTGGATC 
GTGGTTC7AC 
CCGGTC CGGC 
GGCGCTTTCG 
GGTCGCCGAG 
CCTGTGGGGT 
CACCCATCTT 
CTCACCCCGG 
TGCCCCCTGT 
CTCCTGGGGC 
GGTGGCCGAG 
GCGCGCAGCG 
TGCGGAGGGG 
GTGGTTGGTG 
CCGCGAGGAA 
GGCGCTGGAG 
AGGCATGCCG 
GGGTCTGCTC 
GCGCCCCAAG 
CCTCTTCGTA 
CGCGGCAGGC 
CGCCCTGACC 
CCGGGAACAT 
GGAATGGCTG 
TGCGGGAGCG 
CACGAAAGCG 
TGTGGAGACC 
CTTTACCGAC 
CCTGATGGCT 
GACGCTGGCG 
GCTGGAGCTG 
GATCGCCATC 
CTGGCAGCTG 
TGGGGCAGAC 
GGGrGGCTTT 
GGAGGCGATG 
CGAGCGCGCG 


GAGCGCTCGG 
GCGGCGCGGC 
TTCAGCCTGG 
CGCGAGGGGC 
GCGCGCTGCA 
GCGCACACGC 
TTCGACCGGT 
TGGGCGGAGG 
GCGCTCTTCG 
GAGCTCCTGG 
TCGCTGCAAG 
GCGGGCGGCG 
CCGCACGCGG 
GGCGTGGAGC 
AAGCGGCT3C 
TTCGGGCGGG 
CTGAGCGGGA 
CGGGAGGCGG 
TTCCTCGA^G 
GCGGAGCCGA 
GAGGCGCTGG 
ACGGCTGGGC 
GAGGCGCCGG 
CGGGTGGACT 
GGGTGGCTGG 
TCGCACGGAT 
CAGGTGACCC 
CTGGACGCCG 
GCCACGGCGC 
CTCTGGATCG 
CAGGCGGCGC 
GGGCTCGTGG 
CTGCTTTCGC 
CGGCTCGTGG 
AGTTACTTGG 
GAGCGCGGGG 
TGGGGCCGAG 
GCGCAGGGCG 
GCGCTCTTGG 
GACGACGGGC. 
GTGGAG3GGG 
CTGTTTTCCT 
AATGCCrTTT 
ATCGCCTGGG 
GAGGCATCGG 
CTCGGTACGC 
GCTCCGCGCG 
GCCTCCTCCT 
CGCTCGGCGC 
CAAGGCACGC 
GTGGAGATCC 
TTCGACCATC 
CAGGACCGCA 
GTGGGTGCCG 
TTGACCGAGG 
GCGCGCGGCC 
CTGCGCGAGG 
AGCCTGGACC 
GGCCAGGACC 


CGGAGCTTTT 
TGC3CGAGCA 
CGACGACGCC 
TGCTGGCGGC 
TCGCGAGCTC 
CGGGCATGGG 
GCGTGACGCT 
CGGGGAGCGC 
CGGTGGAGTA 
TTGGGCATAG 
ATGGGGTGAG 
CGATGGTGTC 
CGTGGCTCTC 
AAGCGGTGCA 
ATGTCTCGCA 
TGGCGGCGTC 
AGGTGGTCAC 
TGCGCT7CGC 
TGGGCCCGAA 
CGTTGCTGGC 
GCAGGCTGTG 
GGCGGGTGCC 
CCGAACGGCT 
GGCCCGAGAT 
TGC7GGCCGA 
GTTCGTGCGC 
AGGCCCTCGG 
TCGTGGAGGC 
CGGTGCTCGC 
TGACCCGAGG 
TGTGGGGTAT 
ACCTGGATCC 
CGGACGCCGA 
CCGCCCCACC 
TGACGGGTGG 
CGGGGCACCT 
ATCAGCCGCC 
CGCGGGTCAC 
CGGCCGTCGA 
TGCTGGCCCA 
CATGGGTGCT 
CGGCGTCGGG 
TGGACGCGCT 
GCCTGTGGGC 
GAATCTGGGC 
CCGCGACGCA 
ACGCGA'GCCG 
CGGCCGTGCC 
TCTACGAGCT 
TCGACGTGCG 
GCAAACGGCT 
CGACCGTGGA 
CCGACGTCCG 
CCT3CCGCTT 
GCGTGGTGGT 
CCGGCTCGGG 
TGGAGACGTT 
CGCAACAGCG 
CGTCGGCGCT 


GGTGCTGTCG 
CCTGGACATG 
CAGCGCGATG 
GCTTTCGGCC 
CTCGCGCGGC 
CCGGGGGCTC 
GTTCGACCGG 
CGAGTCGTTG 
CGCGCTGACG 
CATCGGGGAG 
GCTCGTGGCG 
GCTCGGAGCG 
GA7CGCGGCG 
GGCGATCGCG 
CGCGTTCCAC 
GGTGACGTAC 
GGACGAGCTG 
GGACGGGGTG 
GCCGACGCTG 
GTCGTTGCGC 
GGCCGCTGGC 
GCTGCCGACC 
CGGAGCCACG 
GCCTCGCTCA 
CCGGGGTGGA 
CGTGCTCCAT 
TGGCCGCAAC 
GGGGGCATCG 
GCTGATTCAG 
GGCCTGCACG 
GGGCCGGGTC 
GGAGGAGAGC 
CCATCAGCTG 
GGAGGGAAAC 
GCTGGGCGCC 
TGTGCTGATC 
AGAGGTGCGC 
CGTGGCGGCG 
GCCGCCGCTG 
CCAGGAC GCC 
GCACACCCTT 
CGTCTTCGGC 
GGCGGACCTC 
GGAG GGGGGG 
GATGCCGACG 
GCGCGTGGTC 
AGGCCGCTTC 
AGCTGTAGAG 
TGTGCGCGGC 
ACGAGGCT7C 
TCAGGGTGAG 
GCGGCTGGTG 
AAGCGTTCGG 
CCCGGGCGGG 
CAGCACCGAG 
AGAGGCTCCG 
CGATGCGGCG 
GCTGCTGCTG 
GCGCGAGAGC 


GGCAAGAGCG 
CACCCGGAGC 
ACCCACCGGC 
GTGGCGCAGG 
AAGCTGGCGT 
TGCGCGGCGT 
GAGCTGGACC 
TTGCTGGACC 
GCGCTGTGGC 
CTCGTGGCGG 
GCGCGCGGGC 
CCGGAGGCGG 
GTCAATGGGC 
GCGGGGTTCG 
TCGCCGCTGA 
CGGCGGCCAA 
AGCGCGCCGG 
AAGGCGCTGC 
CTCGGCCTGT 
GCCGGGCGCG 
GGCTCGGTCA 
TATCCGTGGC 
GCCGCCGATG 
TCCGTGGATT 
GTCGGGGA3G 
GCGCCCGCCG 
GACTGGCAGG 
GCCGAAGAGG 
GCGGTGGGCA 
GTGGGCGGCG 
GCGGCGCTGG 
CCGACGGAGG 
GCATTCCGCC 
GCAGCGCCGG 
CTTGGCCTCC 
AGCCGGCACG 
GCGCGCATTG 
GTCGACGTGG 
CGGGGGGTCG 
GGTCGGCTCC 
ACCCGCGAGC 
TCGATCGGCC 
CGTCGAACGC 
ATGGGCTCGC 
AGTCGTGCCC 
ATCCAGATGG 
TGGGATCGGC 
CGCTGGCGCA 
GTGGTCGCCG 
GCCGACCAGG 
CTGGGTATGC 
GAATACTTGC 
TTGCCGGCGA 
GTCGAGGACC 
GTGCCGGCCG 
AGACAGACCT 
TTCTTCCACA 
GAAGTGAGCT 
CCCACGGGCG 
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TGTTCGTGGG 

CGGCGGGGCT 

TTTTCCTGGG 

TGGCGCTGCA 

GCGGGGTCAA 

TTTCGCCCGG 

AGGGCTGCGC 

TCCTG3CGGT 

TGCCCAGCGG 

TTCCGGCCGA 

TCGAGGTGCG 

TCCTGGGAGC 

TGCTCAAGGC 

agcttcaaccc 

CCTGGCCGCG 
CGAACGCGCA 
AGCGCTCGGC 
CGGCGCGGCT 
TCAGCCTGGC 
GCGACGGGCT 
CGCGCTGCAT 
CGCAGACGCC 
TCGACCGGTG 
GGGCGGAGCC 
CGCTCTTCAC 
AGCTGGTGGC 
CGCTGGAAGA 
CGGGCGGCGC 
CGCACGCGGC 
GCGTGGAGCA 
AGCGGCTGCA 
TCGGGCGGGT 
TGAGCGGGAA 
GGGAGGCGGT 
TCCTCGAAGT 
CGGAGCCGAC 
AGGCGCTGGG 
CGGCTGGGCG 
ACATCGAGCC 
GCG7GGACTG 
GCTGGCTGGT 
CACGTGGACT 
TGGTGACCGA 
TGGAC3CCGT 
CTACCGCGCC 
7CTGGGTCGT 
AGGCGGCGTT 
GGCTCGTGGA 
TGCTCGTCAC 
GCCGGCACGC 
TGTCTGCGGA 
CCCAGTGGCT 
CCGACCGGCA 
TCGAGGCGCT 
TCGAACCGAT 
CCGCTGGCGT 
TGCTCCGTCC 
TCGACCTGTT 
CGTACGCGGC 


CGCGGGCCCC 
CTACAGCCGC 
CCTGCACGGG 
CCTCGGCT3C 
CATGCTGCTC 
CGGGCGGTGC 
CGTGGTGGTG 
GATCCGGGGT 
CCCTGCCCAC- 
CGTCGAT7TC 
GGCGCTGAGC 
CGCCAAGGCC 
GGTGCTCGCG 
GCTCTTGCCG 
CACGGACCGT 
TGTGGTGCTG 
GGAGCTTTTG 
GCGCGAGCAC 
GACGACGCGC 
GCTGGCGGCG 
CGCGAGCTCG 
GGGCATCGGC 
CGTGGCGCTG 
GGGGAGCGCC 
GGTGGAGTAC 
TGGGCATAGC 
TGGGG7GAGG 
GATGGTGTCG 
G'i’GGGTGTCG 
AGCGGTGCAG 
TGTCTCGCAC 
GGCGGCGTCG 
GGTGGTCACG 
GCGCTTCGCG 
GGGCCCGAAG 
GCTGCTGGCG 
CAGGCTGTGG 
GCGGGTGCCG 
TGACAGCCC-T 
GCCGGACATA 
ATTGGCGGAT 
TCCATGCGTC 
GGCTGCCGGC 
CGTCGGCGCG 
GGTGCICGGC 
GACCCGGGGG 
ATGGGGCATG 
CCTGGATCCC 
CGAGCTA'i'TG 
GGCAC3GCTG 
GGCGAGCTAC 
GGTGGAGCTG 
GGCGTGGCGC 
GGAGGCGCGG 
GACAGCGCTG 
CAGCGTCAT3 
CAAGGTGGCC 
CGTGCTGTTC 
GGCCAACGCT 


AACGAA7ATG 
ACCGGCAACA 
CCGACCCTGG 
CAGAGCTTGC 
TCGCCGAAGA 
AAGACG'l'TCT 
CTCAAGCGGC 
ACGGCGATCA 
GAGGCGCTGT 
GTGGAATGCC 
GACGTGTACG 
AACCTTGGGC 
CTGGGGCAAG 
TGGGAGGCG2 
CCGCGCTTCG 
GAAGAGGCGC 
GTGC7GTCGG 
CTGGACATGC 
AGCGCCATGA 
CTTTCGGCCG 
TCGCGCGGCA 
CGGGGGCTTT 
TTCGACCGGG 
GAGTCGTTGT 
GCGCTGACGG 
GCCGGGGAGC 
CTCGTGGCGG 
CTCGGAGCGC 
ATCGCGGCGG 
GCGATCGCGG 
GCATCCCACT 
CTGACGTACC 
GACGAGCTGA 
GACGGGGTGA 
CCGACGCTGC 
TCGT7GCGCG 
GCCGCCGGCG 
CTGCCGACCT 
CGCCACGCAG 
CCTC3CAGCC 
AAGGGTGGAG 
GTGCTCCATG 
GGTCGAAGCG 
GAGGCGTCGA 
TTGGCTCGGT 
GGATGCATCG 
GGCCGGGTGG 
CGAGCGAGCC 
TCGCAGGAGA 
GTGGCCGCCC 
CTGGTGACGG 
CGAGC3CGGC 
GAGCAGCAGC 
GGTGCACGGG 
GTTTCGTCGG 
CGTCCACTGG 
GGGAGCTGGC 
TCGTCGGGCG 
TTCCTCGACC 


CCGAGCGGGT 
TGCTCAGCGT 
CTGTGGATAC 
GACGGGGCGA 
CCTTCGCGCT 
CGGCCGACGC 
TCTCCGACGC 
ATCA'l'GATGG 
TACGCCAGGC 
ACGGGACCGG 
GGCAAGCCCG 
ACATGGAGCC 
AGCAAATACC 
7GCCGGTGGC 
CGGGGGTGAG 
CGGCGGTCCA 
GCAAGAGCGA 
ACCCGGAGCT 
ACCACCGGCT 
TGGCGCAGGG 
AGCTGGCGTT 
GCGCGGCGTG 
AGCTGGACCG 
TGCTCGACCA 
CGCTGTGGCG 
TGGTGGCGGC 
CGCGCGGGCG 
CGGAGGCGGA 
TCAATGGGCC 
CGGGGTTCGC 
CGCCGCTGAT 
GGCGGCCAAG 
GCGCGCCGGG 
AGGCGCTGCA 
TCGGCCT3T7 
CCGGGCGCGA 
GCTCGGTCAG 
ATCCGTGGCA 
GCGCGGATCC 
TCCAGAAATC 
TCGGCGAGGC 
CGCCGGCAGA 
ATTGGCAGGT 
TCGATGACAT 
TTCTGAGCAC 
TTGGCGACGA 
C3GCGCTCGA 
CGCCCCAAGC 
CCGAGGACCA 
CGCCACGGGG 
GAGGCCTCGG 
ACTTG3TGCT 
CGCCTGAGAT 
TGACCGTGGC 
TCGAGCCCCC 
CGGAGACGGA 
TGCTGCACCG 
CAGCGGTGTG 
GGCTCGCGCA 


GCAGGACCTC 
TGCGGCGGGA 
GGCGTGCTCC 
GTGCGACCAA 
GCTCTCACGG 
GGACGGCTAC 
GCAGCGCGAC 
CCCGAGCAGC 
GCTGGCGCAC 
GACGGCGCTG 
CCCTGCGGAC 
CGCGGCGGGC 
AGCCCAGCCG 
GGTGGCCCGC 
CTCGTTCGGG 
GCTGTGGCCT 
GGGGGCGCTC 
CGGGCTCGGG 
CGCGGTGGCG 
GCAGACGCCG 
CCTG7TCACC 
GCCAGCGTTC 
CCCGCTGTGC 
GACGGCGTTC 
GTCGTGGGGC 
G7GCGTGGCG 
GCTGATGCAG 
GGTGGCCGCG 
GGAGCAGGTG 
GGCGCGCGGC 
GGAACCGATG 
CGTTTCGCTG 
CTACTGGGTG 
CGAAGCCGGC 
GCCAGCTTGC 
GGAGGCTCCG 
CTGGCCGGGC 
GCGGCAGCGG 
GACCCAAGGC 
AGAGGAGGCG 
GGTCGC7GCA 
GACATCCGCG 
AGTGCTCTAC 
CGGCGACGCG 
CGTGTCTTGT 
GCCTGCGATC 
GCA7CCCGGG 
CAGCCCGATC 
GCTCGCCTTC 
GGAAGCGGCA 
TGGGCTGGGC 
GACCAGCCGG 
CCGCGCGCGG 
AGCGGTGGAC 
GCTGCGAGGG 
CGAGACCCTG 
GCTGCTGCAC 
GGGTAGCCAT 
TCTTCGGCGT 


GCCGATGAGG 
CGGCTG7CAT 
TCGTCGCTCG 
GCCCTGGTrG 
ATGCACGCGC 
GCGCGGGCCG 
CGCGACCCCA 
GGGCTGACAG 
GCAGGGGTGG 
G-3CGACCCGA 
CGACCGCTGA 
CTGGCCGGCT 
CAGCTGGGCG 
GCAGCGGTGC 
ATGAGCGGAA 
GCCGCGCCG3 
GACGCGCAGG 
GACGTGGCGT 
GTGACGTCGC 
CCGGGGGCGC 
GGACAGGGCG 
CGAGAG3CGT 
GAGGTGATGT 
ACCCAGCCCG 
GTAGAGCCGG 
GGGGTGTTCT 
GGGCTCTCGG 
GCGGTGGCGC 
GTGATCGCGG 
GTGCGCACCA 
CTGGAGGAGT 
GTGAGCAACC 
CGGCACGTGC 
GCGGGGACGT 
CTGCCGGAGG 
GGGGTGCTCG 
GTCTTCCCCA 
TACTGGCCCG 
TGGTTCTATC 
AGCCGCGGGA 
GCGCTG7CGA 
ACCGCCGAGC 
CTGTGGGGTC 
ACCCGTCGTG 
TCGCCCCGAC 
GCCCCTTGTC 
GCCTGGGGCG 
GACGGCGAGA 
CGCCATGGGC 
CCGGCGTCGC 
CTGATCGTGG 
CGCGGGTTGC 
ATCGCAGCGG 
GTGGCCGACG 
GTGGTGCACG 
CTCGAGTCG3 
GGCCGGCCTC 
AGCCAGGGTG 
TCGCAATCGC 
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TGCCTGCGTT 
CTCATGCACG 
CGCTCCAGCG 
CGCGCTTCGC 
CAGGGCGCGA 
TGTCCGTTGC 
TGCTGGGCTT 
TCGACTCGTT 
TTTCGACGAC 
TCGATGTACT 
ACGAGCCCAT 
AGTCCTACTG 
GGTGGGATGC 
CCAAAGGCGC 
CTCGCGAGGC 
CGCTCGAGAG 
TGGGTGCGGG 
CAGGGCTGTA 
TCCTGGGTCT 
CGCTGCACCT 
GGG7CAACGT 
CGCCCGACGG 
GGTGCGCCGT 
TGGCCCTGAT 
CCAACGGACC 
CGGTCGACGT 
AGGTGCAGGC 
TGGGGGCCGT 
TCAAGGCCGT 
TCAACCCGCA 
GGGGGCGCGG 
ACGTGCATGT 
GACCGGTGGA 
AACGGCTCTC 
GCCTGGGGAC 
AGGCCCTGCG 
GCGGCAAGGC 
AAATGCCGGG 
ACCGGTGCGT 
CTGCGCCGGG 
TCTTTGCGCT 
TACTCCTCGG 
TCGAAGA CGC 
GCGGTGCCAT 
ACGCCGCCAC 
CCGAGGTACA 
GGCTCGCCGT 
AGCGGGTCGC 
CCGGCCACGT 
GCGCCGTGCG 
TCGAGATTGG 
ACGCGGTCCT 
CGCTCGGGAC 
GCGCGCGCCG 
TCACCCCGCG 
GGCTCTGCAT 
CCTTCCTCGG 
CGGTGATCCT 
TGGAGTTCCT 


GAGCGTCGCG 
TCTGAGCGAC 
CCl’GGTGGAG 
GCCGG7GTAC 
CATCATCGCG 
GGAAGCCCGC 
CCTCGACCCG 
GATGGCGGTG 
GCTGGCCTTT 
GAAGCTGGAG 
CGCCATCGTG 
GCAGCTGTTG 
GGCGGACTGG 
CTTCCTGCGC 
GATGAGCCTC 
CGCCCGTATC 
GCCCAA7GAG 
CGGCGGCACC 
GCACGGCCCG 
CGCCTGCCAG 
GCTGCTCGCG 
GCGGTC-CAAG 
GGTGGTGCTC 
CCGGGGAAGC 
CGCCCAGCAA 
TGATTTTGTG 
GCTGAGCGAG 
CAAGGCCAAC 
GCTTGCGCTG 
CTTGCCGTGG 
CGCACGGCCG 
CGTGCTGGAG 
GCTGGTTGTG 
GGCGCACCTG 
GACGCGCAGC 
AGGCGCGCTG 
CGTGTCCTCA 
CATGGGCCGT 
CGCGCTCTTC 
CCTCGCTCAG 
GGAGTACCCG 
T CAT AG CATC 
GGTGAGGTTG 
GCTCCCCATC 
GGTGTCGATC 
GGTGCTCGCC 
CTCCCATGCG 
TGCGACGATC 
CGCAGGCCCC 
CTTCGGCGAT 
CCCGAAGCCG 
CGTGCCGTCG 
TTGGTATGCC 
CG7GGCTCTG 
AAGCGCCGCG 
GCCCGGCGC7 
TGATCACCTC 
CAGCATCGCC 
GAAGGCGATC 


TGGGGTCTGT 
ATCGGGGTTC 
ACCGGCGCGG 
ACCGCTCGAG 
CCTTCCCCTC 
ATGGCTCTGC 
AGCGCGCTCG 
GAGATCCGCA 
GATCATCCGA 
GATCGCAGCG 
CCAGCCGCCT 
GCCGAGGGCG 
TACGACCCTG 
GATTTGCAGA 
G ACCCGCAGC 
GCTCCGGATA 
TACTACACGC 
GGGAACATGC 
ACGCTGGCCA 
AGCCTGCGAC 
CCGGAGACCT 
ACGTTCTCGG 
AAGCGGCTGC 
GCGGTGAACC 
GCATTGCTGC 
CAGTGTCACG 
GTGTATGGTC 
GTCGCGCATC 
CGGCACGAGC 
AACACGCTGC 
CGTCGGGCCG 
GAGGCACCGG 
CTATCGGCCA 
TCCGCGCACC 
CCGATGGAGC 
GACGCCCCGG 
CGCGGTAAGT 
GGGCTGTACG 
GATCGGGAGC 
GCGGCGCGGC 
CTGGCTGCCC 
GGCGAGCTGG 
GTGGCCGCGC 
GCAGCGTCCG 
GCCGCGGTCA 
CTCGGCGCCA 
TTCCACTCGC 
GCGTACCGCG 
GAGATCGCCA 
GGGGCAAAGG 
GTCCTGCTCG 
CTACGCGCGG 
TGGGGGGGTG 
CCCATGTATC 
CCTGCAGGGA 
GTGTTGCACC 
GTGTTTGGCA 
GCCGAGCGCT 
GCGATGGAGC 


GGGCCGAGGG 
TGCCCArGTC 
C7CACCGCAC 
GGCGTCGCAA 
CGGCGGCAGC 
ACGAGGTCGT 
ATCCTGGGA1* 
ACCTCCTTCA 
CGGTACAGCG 
ACACCCAGCA 
GCCGCTTCCC 
TGGTGGTCAG 
ATCCGGAGAT 
GATTGGATGC 
AGCGGTTGCT 
CGCTGCGAGA 
AGCGGCT GCG 
TCAGCGTTGC 
TGGATACGGC 
TGGGCGAGTC 
TCGT3C7GCT 
CCGACGCGGA 
CCGATGCGCA 
ACGACGGCCC 
GCCAGGCGCT 
GGACAGGGAC 
CAGGGCGCTC 
TGGAGGCGGC 
AGATCCCGGC 
CGGTGGCGGT 
GCGTGAGCGC 
AGGTGGAGCT 
AGAGCGCGGC 
CGGAGCTGAG 
ACCGGCTCGC 
CGCAGCGGCA 
TGGCTTTCCT 
AGGCGTGGCC 
TCGACCAGCC 
TCGA'i'CAGAC 
TGTGGCGTTC 
TCGCCGCCTG 
GCGGGCGGCT 
AGGCCGAGGT 
ACGGTCCTGA 
CGTTCGCGGC 
CGCTCATGGA 
CGCCAGACCG 
CGCCCGAGTA 
CGTTGCATGC 
GGCTATTGCC 
ACCGCTCGGA 
CCCTCGACTG 
CATGGCAGCG 
TCGCAGGTCG 
ACGTGCTCTC 
AGGTGGTGGT 
GGCCCGAGCG 
CCGACCAGGA 


AGGCATGGCG 
GACGTCGGCA 
GGTGACCCGG 
CCTGCTTTCG 
AACCCGGAAC 
CCATGGGGCC 
GGGGTTCAAT 
GGCTGAGCTG 
GCTGGTGGAG 
TGTTCCGTCG 
GGGCGGGGTG 
CGCCGAGGTG 
CCCAGGCCGG 
GACCTTCTTC 
CCTGGAGGTA 
TAGCCCCACC 
AGGCTTCACC 
GGCTGGACGG 
GTGCTCGTCC 
CGATCAAGCG 
CTCACGGATG 
CGGCTACGCG 
GCGCGCCGGC 
GAGCAGCGGG 
TTCGCAAGCA 
GGCGCTGGGC 
CGAGGATCGA 
ATCCGGCTTG 
CCAGCCGGAG 
GCCACGTAAC 
GTTCGGGTTG 
GGTGCCCGCG 
GGCGCTGGAC 
CCTCGGCGAC 
CATCGCGACG 
GACGCCGCAG 
GTTCACCGGA 
AGCGTTCCGG 
TCTGCGCGAG 
CGCGTACGCG 
GTGGGGCGTG 
CGTGGCGGGC 
GATGCAGGCG 
GGCCGCCTCC 
CGCCGTCGTG 
GCGTGGGATA 
TCCGATGCTG 
CCCGGTGGTG 
TTGGGTCCGG 
CGCC-GGTGCC 
AGCGTGCCTC 
ATGCGAGGTG 
GAAGGGCGTG 
TGAGCGCCAT 
CTGGCCGCTG 
GATCGGACCA 
GCCCGGCGCC 
GGCGA7CGAG 
GGTCGAGCTC 


GACGCGGAGG 

GCGTTGTCGG 

A7GGACTGGG 

GCGCTGGTCG 

TGGCGTGGCC 

GTCGCTCGGG 

GAGCAGGCCC 

GACGTGCGGC 

CATCTGCTCG 

TTGGCGTCAG 

GAGGACCTGG 

CCGGCCGACC 

ACTTACGTGA 

CGCATCTCGC 

AGCTGGGAGG 

GGGGTGTTCG 

GACGGAGCGG 


CTGTCG7TTT 

TCCCTGGTCG 

CTGGTTGGCG 

CGCGCGCTTT 

CGGGGCGAGG 

GACTCCATCC 

CTGACCGTGC 

GGCGTGTCTC 

GACCCGATCG 

CCGCTGCTGC 

GCCAGCCTGC 

CTGGGGGAGC 

GCGGTGCCGT 

AGCGGAACCA 

GCGCCGGCGC 

GCCGCGGCGG 

GTGGCGT7CA 

ACCTCGCGCG 

GGCGCGGTGC 

CAGGGCGCGC 

GAGGCGTTCG 

GTGA7GTGGG 

CAGCCGGCTC 

GA3CCGCACG 

GTGTTCTCGC 

CTGCCCGCCG 

GTGGCACCCC 

ATCGCTGGCG 

CGCACGAAGA 

GAAGACTTCC 

TCGAATGTCA 

CATGTGCGAA 

GCCACGTTCG 

GGGGAAGCGG 

GTCCTCGCGG 

TTCCCCGATG 

TGGATGGACC 

GCTGGTGTCG 

CGCCATCAGC 

TTTCATGTCG 

CTGACAGGCG 

CACGCCGTGC 
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TCACCCCCGA 
AGACCGAACG 
CCGGCGCGTT 
CCGGATTCCT 
TGCAGGACGG 
ACGCCCACGA 
TGC7GGCAAC 
AACGGGTGCG 
CGCAGGCATT 
AGGTGGAGGG 
GCGCGTCGAC 
CTGCGGAACG 
CGGCGCTCGC 
CCCTCGCGGG 
AAGCTCCGGC 
CCGATCGCGC 
AGCGGGTGCA 
GCCCGGAGCT 
CTGACGT7CT 
CCGGAGAGCG 
CTGACGCAGA 
TCGCGCCGGC 
CGGGGCTCAA 
CGATGGGCGG 
CGGTCGGCGA 
GGCTGCTCGT 
CGTTCCTGAC 
TGCTGATCCA 
TAGGGGCCGA 
GCGTGCCGCG 
AGGTCACCGG 
ACGCGAGCCT 
TACGGGATCG 
TCCTGGAGCT 
CTGCGGGACA 
CGTTTCGGTT 
CCGCAGCGCC 
GGCTCCACGT 
GGCGGGGCCT 
CTCGGGTGAC 
AGGCCATTCC 
ATG3TGTGCT 
CTGGCGCCTG 
TCTCCTCCAT 
CCTTCCTCGA 
CG7GGGGCCC 
GGCTCGCTCG 
AGGCGCTGGC 
GCCAAGCTTC 
GCCATGCGGC 
CGCGTCGCGC 
GGGGCGCCGC 
TCACGGCGGT 
CGCTGGCATT 
TGGCCGTGGC 
AGCCCATTGC 
CGTTTTGGCG 
GGGACATCGA 
TTGGCGGCTT 


AGCCGCCGGG 
CCGATGGACG 
GCCGCGCCTC 
CGACAGGTTA 
GCGCGTCGGC 
CGTGGCGCCC 
CCGGAGCGAG 
GTGGTGGCGG 
CGGTGTCTCG 
ATTTGTTTGC 
TGCAGCCTTG 
GATGGAGGAG 
AACACGGCTC 
GGTGTCTCCC 
GGCGGCGCAG 
GGTGCGCCTG 
GGTCGCCACA 
CAGCTGCACT 
GCTGCGGGAG 
CCGCGTAGCG 
ATCCTATCGA 
ACAGCGCCGG 
CTTCCGGACC 
AGATTGTGCC 
TGGTGTCATG 
CCGGCAGCCT 
GGCCTGGCTG 
TGCTGCGGCC 
GGTGT7CGCC 
CACGCACATC 
CGGCCGGGGC 
GTCCCTGC7G 
AGCCGCGG7C 
CGCTCCGGAT 
TCTGCGCGCA 
CA7GCCGCAA 
C7TGGCGCCG 
GGCCCGCTGG 
GGATACGCCG 
GATCGCGGCG 
GGCGGAGTGG 
TGATGAGCAG 
GAATC7GCA7 
GTCGGGGCTC 
CGCGCTGGCC 
ATGGTCCGAC 
GCATGGGATG 
TCGGCCGGAA 
GGGAGCGGCA 
GGCTGGGGCG 
CGACGAGGTG 
GAGCGCCGTG 
GGAGCTGCGC 
CGATCACCCG 
CGAGCCGAGC 
GGTGATCGGC 
GCTGCTCGAA 
CGCGTTC7AT 
CCTGTCCGAT 


GATGGC7ACC 
ACCCACGCCC 
GAGGTCCTGG 
TCGGCGGTGC 
GACGAGGCCT 
TTGCACCCGA 
CCGGAGGACG 
GCGCCGGTTG 
AGCTTCGTGC 
CGCCGGGCGC 
TACCGCCTCG 
AGCTGGGTCG 
AACCGCTGCG 
GCAGGTGTGA 
CGTGTGGCGA 
TGGTGGG7GA 
GCGCCGGTAT 
CTGGTGGATT 
CTCGGTCGCG 
CGGCTGG7CA 
CTGGAGGCTG 
GCACCCGGCG 
CTCCTCGCTG 
GGTATCGTCA 
ACGCTGGGGA 
GCAGGGCTGA 
GCTCTGCACG 
GGCGGCGTGG 
ACGGCGAGCC 
GCCAGCTCGC 
GTGGACGTGG 
ACGACGGGCG 
GCGGCGGCCC 
CGAACTCGAG 
TTGCCGGTGC 
GCGCGGCATC 
ACGGGCACCG 
CTCGCCCAGC 
GGCGCTGCCA 
TCGGATGTCG 
CCGTTACAGG 
ACCACCGACC 
GAGCTCACGG 
TTGGGCTCGG 
GCGCATCC-GC 
GGAGGCATGG 
GGAGCGCTGT 
ACGCAGCTCG 
GTGCCGCCTG 
CAGGGGGCA7 
CGCAAGGTCG 
CCCGTCGATC 
AACCTCCTCG 
ACGGTCGACG 
GTATCGCCCG 
ATCGGCTGCC 
GAGGGCAGCG 
GATCCGGATC 
ATCGACCGGT 


TGTTCGAGCT 
GCGGTCGGGT 
AGGACCGCC-C 
GGATCGGCTG 
CGCTTGCCAC 
TCCTGCTGGA 
ACGGGACGCC 
GAAGGGTGCG 
TGGTCGACGA 
CGCGAGAGGT 
ACTGGCCCGA 
TGGTGGCAGC 
TACTCGCCGA 
TCTGCCTCTG 
CCGAGGGCCT 
CCACGGGCGC 
GGGGCCTGGG 
TGGAGCCGGA 
CTGACGACGA 
AAGCGACAAC 
GGCAGAAGGG 
CGGGCGAGGT 
TGCTGGGAAT 
CGGCGGTGGG 
CGTTGCATCG 
CTCCCGCGCA 
ACCTGGGGAA 
GCATGGCCGC 
CGTCCAAGTG 
GGACGCTGGA 
TGCTCAACGC 
GGCGGTTCCT 
ATCCCGGTGT 
AGATCCTCGA 
ATGCGTTCGC 
AGGGCAAGG7 
TACTGCTGAC 
AGGGCGCGCC 
AAGCCGTCGC 
CCGATCGGAA 
GCGTGATCCA 
GCTTCTCGCG 
CGCGCAACGA 
CCGGGCAGTC 
GGGCCGAAGG 
CAGCGGGGCT 
CGCCCGCTCA 
GGGCGATGTC 
TGTGGCGCGC 
7GGCCGCGCG 
TGCAGGCCGA 
GGCCGCTGTC 
GCCAGCGGGT 
CGCTCACGCG 
CAAAGTCGTC 
GTTTCCCAGG 
ATGCCGTCGT 
CGGATGTGCG 
TCGAGCCGGC 


GGCGACCCTG 
GCAGCCGACA 
GATCCAGCCC 
GGGTCCGCTT 
CCTCGTGCCG 
CAACGGCTTT 
CCCGCTGCCG 
GTGTGGCGGC 
AACTGGCGAG 
CTTCCTGCGG 
AGCCCCCTTG 
ACCTGGCTCG 
ACCCAAAGGC 
GGAACCTGGA 
TTCGGTGGTG 
CGTGGCTGTC 
CCGGACAGTG 
GGTCGATGCC 
GACCCAGGTG 
CCCCGAAGGG 
CACATTGGAC 
CGAGATCAAG 
GTATCCGGGC 
CCAGGGGGTG 
ATTCGTCACG 
GGCAGCTACG 
TCTGCGGCGC 
GGTGCAAATC 
GGCAGCGGTT 
GTTTGCTGAG 
GCTGGCCGGC 
CGAGATGGGC 
TCGCTATCGG 
GCGCGTGGTC 
GATCACCAAG 
CGTGCTGCTG 
CGGTGGGCTG 
GCACATGG7G 
GGAGATCGAA 
CGCGCTGGAG 
TGCAGCCGGA 
GGTGCTGGCA 
TCTCGCTTTC 
CAACTATGCG 
CCTGGCGGCG 
CAGCGCGGCG 
GGGCACCGCG 
GC7CGACGTG 
GCTGG7GCGC 
CCTTGGCGCG 
GATCGCGCGC' 
GGACTTGGGC 
GGGTGCGACG 
CTGGCTGCTC 
GCCGCAGGTC 
CGGCGTGACC 
CGAGGTGCCG 
CGGCAAGATG 
CTTCTTCGGC 


GCGGCGCCGG 
GACGGCGCGC 
CTCGACTTCG 
TGGCGATGGC 
ACCTATCCGA 
GCGGTGAGCC 
rTCGCCGTGG 
GTGCCGCGGT 
GTGGTCGCTG 
CAGGAGTCGG 
CCCGATGCGC 
GAGATGGCCG 
CTCGAGGCGG 
GCCCACGA3G 
CAGGCGCTCA 
GAGGCCGGTG 
ATGCAGGAGC 
GCGCGTTCAG 
GTTT7CCGT7 
CTCTTGGTCC 
CAGCTCCGCC 
GTAACCGCCT 
GACGCTGGGC 
CACCACCTCT 
GTCGACGCGC 
GTGCCGGTTG 
GGCCAGCGGG 
GCCCGA7GGA 
CAGGCCATGG 
ACGTTCCGGC 
GAGTTCGTGG 
AAGACCGACA 
GTATTCGACA 
GAGGGCTTTG 
GCCGAGGCAG 
CCGGCGCCCT 
GGAGCGTTGG 
CTCACAGGTC 
GCGCTCGGCG 
GCTGTGCTCC 
GCGC7CGATG 
CCGAAGGTGA 
TTCGTGCTGT 
GCGGCCAACA 
CAGAGCCTCG 
CTGCAGGCGC 
CTGCTCGGCC 
CGTGCGGCAA 
GCGGAGGCGC 
CTGCCCGAGG 
GTGCTTTCAT 
C7CGACTCGC 
C7GCCGGCGA 
GATAAGGTCC 
GCCCTCGACG 
GATCCGGAGT 
CATGAGCGAT 
ACGACACGCT 
ATCTCGCCGC 
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GCGAAGCGAC 
TCGAGCGCGC 
GGCTCTTCTA 
TAGGCACCGG 
AGGGGCCGAG 
CCTGCCAGGC 
TGCTCACGCC 
GGTGGAAGAG 
TCCTGC7CAA 
GCGGCACCGC 
CGCAGCAAGA 
GCTACGTCGA 
TGGGCGCCGT 
AGTCCAATAT 
TGGCGCTCGA 
TTCCGTGCTC 
GCGCGCCGCG 
TGCTGGAGGA 
TCGTGCTGTC 
ATGTCGTTGC 
GCAGCCCGAT 
CGCTCGACAC 
CAGGCAGCGC 
TGGGCCAAAA 
GA3CGATTCA 
ccrcGCAGcr 
TGTCGGCGCT 
GCGAAGTGGC 
TCTGCCGGCG 
AGCTCTCCCT 
CGGTGAGCAA 
TGCTGGCGAT 
GCCACAGCCC 
AGCCGCGACA 
CGGAGCTCGT 
CGGTGCAA7C 
TCCTGACGAC 
GCTCGTTGCG 
GGGTACACGG 
GTCGCGTGCC 
CCGGCGGCGC 
AAATGCAGAC 
AACGGCTGCC 
CG7ACCTGCA 
TCAGCGATGT 
AGGTCATGGC 
CGGGCCACGG 
GCGCCGAGGT 
CACCCGCTGC 
TCCAGGGGCT 
CCGAGGCCGC 
TCCACGTGAG 
TCGGCTCGCT 
TGAGCCACGG 
CGGGCGCGAT 
GCCGGCGCGA 
GATCCGAGGT 
CGCTCTACTC 
CGAGCGCCGC 


GACCATGGAT 
CGGGATTTTG 
GCAGGAGTAC 
CACCACGGCC 
CCTGACGGTG 
GCTGCCGCGG 
GGCGACGTTC 
CTTCTCGGCC 
ACCGGTTCGC 
GGTGAACCAG 
GGTGATCCGT 
GTGCCACGGC 
GCTGGCACAG 
CGGACATACG 
GCGCGGGCTT 
GGAGCTCGCC 
ACGAGCCGGG 
GGCGCCAGCG 
GGCGAAGAGC 
GCACCCGGAG 
GACGTACCGG 
AGCGGCGCAG 
CCCAAAGGTG 
GCTCCTCTCG 
GGCCGAAGCC 
CGGCCGCATC 
GTGGCGGTCG 
GGCCGCGCAC 
CAGCCTGCTG 
GGCCGAGGCC 
CAGCCCGCGA 
CCTTGCGGCA 
ACAGA7CGAC 
AGCGACCG7G 
GGCC-AGCTAC 
G7TGATGGAA 
GTCGGTCGAG 
GCGTGGACAG 
CCAGGCGGTG 
GC7GCCGACC 
GGCGAGCGGC 
CCTGTCGACC 
GTGGCTCGGC 
GATGGCGCT7 
GGTGCTCGCC 
GACCGAGGAG 
CCGTGCTGCC 
CCCGGCGAGG 
GGCTACCTAT 
TGTCGAGCTG 
CGGCTCCCCA 
CAGC3CCTTC 
GCGGTGGTTC 
AAAGCCAACA 
CGTCGCCGAG 
AGAAGACGAC 
CACGGCGGGC 
GGCGCTGACG 
CGGGTTGCAG 


CCGCAGCAGC 
CCCGAGCGGC 
GCTGCGCTCC 
AGCGTCGCCT 
GACACCGCGT 
GGCGAGTGT7 
GTGGAGTTCA 
GCAGCCGACG 
GA7GCTCAGC 
GATGGGCGCA 
CGGGCCCTGG 
ACCGGCACC-A 
GGGCGACCCT 
CAGGCTGCGG 
ATCCCGAGGA 
GTGCAGGTGG 
GTGAGCTCGT 
GCGGCGTTCG 
GCCGCGGCGC 
CTCGGCCTCG 
CTC GCGGTGG 
GGGCAGGCGC 
GTTTTCCTCT 
GAGGAGCCCG 
GGCTGGTCGC 
GACGTGGTGC 
TGGGGCG7CG 
GTCGCCGGCG 
CTGCGGCGGA 
GAGGCAGCGC 
TCGACGGTGC 
AAGGGGGTGT 
CCGCTGCGCG 
TCGA7GCGCT 
TGGGCGGACA 
GGCGGTCA7G 
GAGATCGGAC 
GACGAGCGCC 
GGCIGGGAGC 
TATCCCXGGC 
AGCCGCTTTG 
CAGAGGAGCA 
GATCACCGGG 
TCGTCTGGGG 
GAGGCGCTGG 
CCACCAGGCC 
TTTCGAAGCC 
CTGGATCTGG 
GCGGCGCTGG 
TGGCGGGGGG 
GCCGCGTGCC 
GCTGACCGCG 
CAGCGGCCCT 
CCCGATCGGC 
ATCTCC GGC-C 
TGGTTCATGG 
CGGTGGCTGC 
GAAGCTGGCC 
GCACTCCTGA 


GGCTGCTCCT 
TGATGGGCAG 
CCGGCGGCAT 
CGGGGAGGAT 
GCTCCTCGTC 
CGGTGGCGCT 
GCCGGCTGCG 
GCGTGGGGTG 
GCGATGGGGA 
GCAACGGGCT 
AGCAG GCGGG 
CGTTGGGCGA 
CGGACCGGCC 
CGGGCGTGGC 
GCCTGCATTT 
CCGCCAAACC 
TTGGCGTCAG 
CGCCCGCG3C 
TGGACGCGCA 
GCGACCTGGC 
CGGCGACCTC 
CGCCCGCAGC 
TTCCTGGCCA 
TCTTCCGCGA 
TGCTCGCCGA 
AGCCGGCGCT 
AGCCGGATGC 
CCCTGTCGCT 
7CAGCGGCCA 
TCC7GGGCTA 
7GGCGGGCGA 
TCTGCCGTCG 
ACGAGCTATT 
CGACGGTGAC 
ACGTTCGACA 
GGCTGTTCGT 
GGGCGACGAA 
7GTCCATG7T 
GGCTGTTCTC 
AGCGCGAGCG 
CTCATGCGGG 
CGCGCGTGTG 
TGCACGGGGC 
CCGAGGCCTT 
CCTTCGCGGA 
GCCIGCAATT 
A7GCCCGCGG 
CCGCGCTTCG 
CCGAGATGGG 
AGGGCGAGGC 
GGCTCCACCC 
GCGAGGCGAC 
CGGGGGAGCT 
GGAGTACCGA 
TCGTGGCGCA 
AGCCGGCTTG 
TCATCGGCTC 
ATTCCG7CGT 
CGGCGTCCTT 


GGASACGAGC 
CGATACCGGC 
CGAGGCGT7C 
CTCTTATGTG 
GCTGGTCGCG 
GGCCGGCGGC 
AGGCCTGGCT 
GAGCGAAGGC 
TCCGATCCTG 
GACGGCGCCC 
GCTGGCTCCG 
CCCCATCGAA 
GCTCGTGATC 
CGGTGTCATC 
CGACGCGCCC 
CGTCGAATGG 
CGGGACCAAC 
GGCGCGTTCA 
GGCGGCGCGG 
GTTCAGCCTG 
GCGCGAGGCG 
GGCTCGCGGC 
GGGCTCCCAG 
CGCGCTCTCG 
GCTCGCGGCC 
GTTCGCGATC 
AGTGGTAGGC 
CGAGGATGCT 
AGGCGAGATG 
CCAAGATCGG 
GCCGGCAGCG 
AGTCAAGGTG 
GGCAGCA77G 
GAGCACGATC 
GCCGGTGCGC 
GGAGATGAGC 
GCGGGAGGGA 
GGAGGCGC7G 
CGCGGGCGGC 
GTACTGCCTC 
CAGTCACCCG 
GGAGACGACG 
GGTCG7GTTC 
GGGTGACGGT 
TGATACGCCG 
CCACGTTGCG 
GGTGCTGCGC 
TGCCCGGCT7 
GCTCGAGTAC 
GCTGGCACGT 
CGCGCTCTTG 
GCCATGGGTA 
GTGGTGTCAT 
CTTTTGGGTG 
GCGGCTCGCG 
GGAACCGACC 
GGGCGGCGGG 
CCACGCGACA 
CGACGGCCAG 


TGGGAGGCGT 
GTGTTCGTGG 
GATGGCTATC 
C7CGGGCTAA 
GTGCACCrGG 
GTGGCGCTGA 
CCCGACGGAC 
TGCGCCATGC 
GCGGTGATCC 
AACGGG7CGT 
GCGGACGTCA 
GTGCAGGCCC 
GGGTCGGTGA 
AAGGTGGCGC 
AATCCGCACA 
ACGAGAAACG 
GCGCACGTGG 
GCGGAGCTTT 
CTTTCGGCGC 
GCGACGACCC 
■CTGTCTGCGG 
CACGCTTCCA 
TGGCTGGGCA 
GCGTGTGACC 
GATGAGACCA 
GAGGTCGCGG 
CACAGCA7GG 
GTAGCGATCA 
GCGGTCGTCG 
CTCAGCGTGG 
CTCGCAGAGG 
GACGTCGCCA 
GGCGAGCTCG 
GTGGCGGGCC 
TTCGCCGAAG 
CCGCATCCGA 
GTCGCGGTGG 
GGAGCGCTCT 
GCGGGCGTCC 
GAAGCGCCGA 
CTCCTGGGTG 
CTGGATC7CA 
CCGGGCGCGG 
CCGCTCCAGG 
GTGGCGGTGC 
AGCCGGGTGC 
CAGACCCAGC 
CAGGCCA GCG 
GGCCCAGCGT 
GTGCGGCTCC 
GATGCGTGCT 
CCCGTCGAAA 
GCGCGGAGCG 
GTCGACAGCA 
GGAGGTGTAC 
GCGGTCCCCG 
CTCGGCGCTG 
GGGCACGGCA 
GCCCCGACGT 
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CGGTGGTGCA 
ACGCCGATGC 
A3GCCGTGGC 
CrCAGGCCAT 
GOG7TATCGC 
GGCGCGACGG 
AAGTCGCGTT 
CCGACTGCCG 
GGTCCGGCGT 
GCGAGGTCGA 
TGGGGATCTA 
GAATTGTCGC 
TCGCGCCCTT 
CCGCGGCGCT 
ACGGTCTCGT 
CGGGGGGCAG 
CGACCGCTG3 
TGGACTCGCG 
TCGACGTCGT 
TCGCGGACGG 
GGCTCGCTCA 
7GCG7CC-GCC 
GAGCGCTCCA 
GGAAAATGGC 
TGCGGATCCG 
TGACCGGCGG 
C7GGGCATCT 
TCGCCGCGCT 
GGGCGCAGAT 
TCGTTCATGC 
TCCGCGCGGT 
AAGCGCCGCT 
GCCAGGGCAA 
CGCAGGGGCT 
CCGGGCACCA 
ACGAAGGGCT 
TGCCGTTCGA 
TGrCGCGGCT 
TGCTCGAACG 
TGCGCGCGCA 
CGCTCACGAG 
CCGXGCTCGG 
TGAGTGCGCA 
CGGATACAGG 
TGTTCGCCTT 
CGAGAAGGCC 
AACGAGCGCG 
7GCCGCTTCC 
CGCGACGCGA 
GACGTACCGC 
7TCG3TATCG 

gtcgcctggg 

ACCGGCGTGT 
CCGCGCGAAG 
CGCCTATCGT 
TCATCGCTGG 
GCGCTGGCGG 
ACCCAGGCGC 
GTCCGTGGGG 


CCTCGGCAGC 
CCTCGAGCAG 
CGGGGCGGGC 
CGGCGCCGGC 
CTTGGAGCAC 
AGAGGTCGAT 
TCGCGGCGGT 
A3AGAAAATC 
GCTCGACGAC 
GATCGCCGTC 
CCCTGGGCCC 
GATGGCCGAA 
CAGTTTCGGC 
GACGGCCGCG 
CCATCTGGGG 
CGGGCTCGCT 
TACGCCGGA3 
GTCGCTGGA3 
GTTGAACTCG 
CCGCTTCATC 
CTTTAGGAAG 
CGAGCGCCTC 
GCCGCTTCCG 
GCAAGCGCAG 
CGTTCC3GGC 
TCTGGGTGGG 
GGTGCTGGTG 
CGAGGCGCAC 
CGAGCGGATC 
GGCCGGTATC 
CATGGCGCCC 
OTCCTTCTTC 
CTACGCCGCG 
GCCAGCATTG 
AAATCGCGGC 
GTGGGCGCTC 
CGTGCGGCAG 
GGTGACGGCA 
GCTCGCCACC 
GGTCTCCCAG 
CCTGGGAATG 
CATCACCATG 
TCTGGCTTCT 
GAACGTGGCT 
GATTGATSAG 
AGCTCCTGGA 
ATACCCTGGA 
CCGGCGGAGC 
TCCGGCCGCT 
GCTGGGCGGG 
CCCCCCGGGA 
AGGGGTTCGA 
TCGTCGGCGT 
AGCGGGACGC 
ACACGC7GGG 
TGGCCATTCA 
GAGG3GTCAA 
TGTC3CCCAA 
AGGGCTGCGG 


CTCGATGAGC 
TCGCTGGTGC 
TTCCGAGATC 
GACGTCTCCG 
GCCGAGCT3C 
GAGCTGCTTG 
GAGCGGCGCG 
GAGCCCGCGG 
CTGGTGCTCC 
GAGGCGGCGG 
GGGGACGGTC 
GGTGTCGAGA 
ACCCACGTCA 
CAGGCAGCCG 
AGGCTCCGGG 
GCTGTGCAGA 
AAGCGGGCGT 
TTCGCCGAGC 
CTGTCTGGCG 
GAGCTCGGCA 
AGCCTGTCCT 
GCAGCGCTGC 
GTAGAGATCT 
CATCTCGGGA 
GAATCCGGC3 
CTCGGTCTGA 
GGCCGCTCCG 
GGCGCGCGTG 
CTCCGCGAGG 
CTGGACGACG 
AAGGTCCGAG 
GTGCTCTACG 
GCCAACACGT 
AGCATC3ACT 
GCACGGCTGG 
GAGCGTCTGC 
TGGGTGGAGT 
CGGCGCGTGG 
GCC GAGGCGG 
GTGCTGCGCC 
GACTCGCTGA 
CCGGCGACCC 
CATGTCGTCT 
CCAATGACCC 
TCACTCGCGC 
GCGCTTGCGT 
GCTCGAGAAG 
GGGCACTCCG 
CGAGGAGCGC 
GCTGCTCACC 
GGCACGGTCG 
ACACCCC3GC 
CTGCGCCACG 
GTACAGCACC 
GCTGCAGGGA 
CCTCGCCTGC 
CATGCTTCTC 
TGGCCGTTGC 
TCTGATCGTG 


GTGGCGTGCT 
GCGGCTGCGA 
CTCCGCGGTT 
TGGCGCAACC 
GCTGCGCTCG 
CCGAGCTGTT 
TGGCCCGGCT 
AAGGCCGGCC 
GAGCCACGGA 
GGCTCAACTT 
CGGTTGCGCT 
GCCTTCGTAT 
CCATCGACGC 
CGCTGCCC3T 
CCGGCGAGCG 
TCGCCCGCCA 
GGCTGCGCGA 
AAGTGCTGGC 
CCGCGATCGA 
AGACG3ACAT 
ACAGCGCCGT 
TGGCGGAGGT 
TCCCCCTCTC 
AGCTCCTGCT 
7CGCCATCCG 
GCGTGGCTGG 
GTGCGGTGAG 
TCACGGTAGC 
TTACCGCGTC 
GGCTGCTGAT 
GGGCCTTGCA 
CTTCGGGAGC 
TCCTCGACGC 
GGGGCCTGT7 
TCACCCC-CGG 
TCGACGGCGA 
TCTACCCGGC 
CTTCCGSTCG 
GCGCGC3GGC 
TCCCCGAAGG 
TGGGGCTAGA 
TGCTGTGGAC 
CTACGGGGGA 
ACGAAGTCGC 
GTGCGGGAAA 
GAGGTTACTC 
ACCGAGCCGA 
GAGGCGTTCT 
TGG3CGCTCG 
GAAGCCATCG 
CTCGACCCGC 
ATCCCGCCTA 
GAGTATCTCC 
ACCGGCAACA 
CCTTGCCTGA 
GGCAGCCTGC 
TCCCCCGACA 
CAGACCTTCG 
CTCAAGCGAT 


CGACGCGGAT 
CAGCGTGCTC 
GTGGCTCGTG 
3CCGCTCCTG 
GA7CGACCTC 
GGCCGACGAC 
CGTCCGAAGG 
GTTCCGGCTG 
GCGGCGCCCT 
TCTCGACGTG 
GGGCGCCGAG 
CC-GCCAGGAC 
CCGGATGGTC 
CGCATTCATG 
CGTGCTCATC 
CCTCGGCGCG 
GCAGGGGATC 
CGCGACGAAG 
CGCGAGCCTT 
CTATGCAGAT 
CGA7CTTGCG 
GGTGGACCTG 
GCGGGGCGCG 
CGCGCTGGAG 
CGCGGACGGC 
ATGGCrGGCC 
CGCGGAGCAG 
GAGGGCAGAC 
GGGGATGCC3 
GCAGCAAAC3 
CCTGCATGCG 
AGGGCTCTTG 
TCTGGCACAC 
CGCGGACGTG 
GACGCGGAGC 
TCGCACCCAG 
GGCGGCATCT 
GCTCGCCGGG 
7iGGAATCC7G 
CAAGCTCGAC 
GCTGCGCAAC 
CTACCCCACG 
TGGGGAATCC 
TTCGCTCCAC 
GAGGTGATTG 
TGGCCCTTCG 
TCGCCATCGT 
GGGAGCTGC'L* 
TAGGTGTCGA 
ACGGCTTCGA 
AGCATCGCTT 
GGTCCCTCGT 
ACGCCGCC’GT 
TGCTCAGCAT 
CCGTCGACAC 
GCGCTCGAGA 
CGATGCGAGC 
ACGCGTCGGC 
TGAGCGACGC 


GCCCCCTTCG 
TGGACCGTGC 
ACACGCGGCG 
GGGCTGGGCC 
GATCCAGCGC 
GCCGACGAGG 
CTGCCCGAGA 
GAGATCGATG 
CCTGGCCCGG 
ATGAGGGCCA 
TGCTCCCGCC 
GTCGTGGCCG 
GCACCTCGCC 
ACGGCCTGGT 
CACTCGGCGA 
GAGA TAT TTG 
GCGCACGTGA 
GGCGAGGGGG 
GCGACCC7CG 
CGCTC3C7GG 
GGTTT3GCCG 
CTCGCACGGG 
GACGCGTTCC 
GACCCGGACG 
ACCTACCTCG 
GACCAGGGGG 
CAGACGGCTG 
GTCGCCGATC 
CTCCGCGGCG 
CCCGCGCGGT 
TTGACACGCG 
GGCTCGCCGG 
CACCGGAGGG 
GGTT7GGCCG 
CTCACCCCCG 
GCCGGGGTCA 
TCGCGGAGGT 
GATCGGGACC 
CAGGAGGTCG 
GTGGAT3CGC 
CGCATCGAGG 
GTGGCAGCGC 
GCGCGCCCGC 
GAAGACGGGT 
CGTGACAGAC 
CAAGACGCTG 
GGGGATCGGC 
CGACGACGGG 
CCCAGGCGAC 
CGCCGCGTTC 
GCTGCTGGAG 
CGGGAGCCGC 
CGCGCACCAG 
CGCCGCCGGA 
GGCGTGCTCG 
GAGCGATCTC 
TCTGGCGCGC 
CAACGGGTTC 
GCGGCGGGAT 
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GGGGACCGGA 
GGGTTGACGG 
GCCGGCGTCG 
GGCGACCCCA 
GCGCGCTGCG 
GTGGCGGGCC 
AACTTTCGTA 
GAACCCGTGC 
ATGAGCGGGA 
GCGGCCCCCG 
GATGCGCAGG 
GATGTGGCGT 
GCGAGCTCGC 
CCGGGAGCCG 
CCCGGCCAGG 
TTCCGGGCGG 
CTCGGGGAGC 
CCGGTGCTCT 
CCGGAAGCGG 
CTGTCGCTCG 
AGCGGTCAGG 
CGTGGCCATG 
GCAGGCGAGC 
TGGCGGCAGG 
GAGCTGATCG 
ACGGTGACGG 
CT7CGGCAGC 
CTGTTCATCG 
GCGSTCGAGC 
ACGCTGCTGG 
CTGTTCCCCG 
TGCTGGATCG 
TGGTTCTACC 
CATGGGAGCT 
CTGTCGACGC 
GCCGAGCAGG 
TGGGGCCTCG 
CGCCGTGCCA 
CCTCGCTTCT 
CTTTGCCAAG 
TGGGGTGGCC 
GCCGAGCTGC 
GCAGCACGCC 
GAGGGAAGCT 
CTGGXGGAGC 
CAGGCGTCGG 
CTGGAAGC3C 
ATGACGGCGC 
GTCTTCCCCG 
CCCAAGGTGG 
TTCGTGCTGT 
GCGGCCAATG 
T7GAGCCTCG 
CGTCTGAGCG 
CGCCTGGTGA 
GCGCCGGTCT 
GAGCGCACTG 
GCGGAGAGCC 
TTCTCCGACC 


TCTGGGCGCT 
CGCCCAACGT 
AGGCCGAGGC 
TCGAGATCGA 
TGCTGGGCGC 
TGATCAAGSC 
C3CTCAATCC 
CCTGGCCGCG 
CCAACGCGCA 
AGCGCGCTGC 
CAGCCCGGCT 
TCAGCCTGGC 
GCGAGGCGCT 
TGCGTGGGCG 
GCTCGCACTG 
CGCTGGAGGG 
TCTCCGCCGA 
TCGCCATGGA 
TGGTGGGCCA 
AG3ACGCGGT 
GCGAGATGGC 
AGGGTCGGCT 
CGGCGGCGCT 
TGAAGGTGGA 
CGGCGCTGGG 
GCGGGGTGAT 
CGGTGCGC7T 
AGATGAGCCC 
AAGGGGGCGC 
AGGCGCTGGG 
CGGGCGGCAG 
AGGTCGAGCC 
GGACCGACTG 
GGCTGCTGTT 
GCGGACTTTC 
TATCCGAAGC 
ACGCCGTCGT 
CCGCACCCGT 
GGGTGGTGAC 
CGGCGTTGTG 
TCGTGGACCT 
TTTCGCCGGA 
TTGTAGCCGC 
ACCTGGTGAC 
GGGGAGCTCG 
GCGGAGAGCA 
AGGGCGCGCG 
TGCTGGCCGC 
TGCGTCCCCT 
CCGGGAGCTG 
TCTCGTCGGG 
CGTTCCTCGA 
CCTGGGGCCT 
ACATCGGAGT 
ACACCAGCGC 
ATGCCGCGCG 
CGTCTCCCCC 
GCTCAGCCCT 
CGGGCGCGCT 


GATCCGAGGA 
GCTCGCCCAG 
CATCGGTTAC 
AC-CGCTGCGC 
GGTGAAGACC 
TACACTTTCG 
GCGGATCCGG 
GACGGGCCGG 
TGTGGTGTTG 
GGAGCTGTTC 
GCGGGACCAC 
GACGACGCGC 
GCGAGGGGC3 
GGCCTCCGGC 
GGTGGGCATG 
TTGCGACCGG 
CGAGGCCGCC 
AGTAGCGCTT 
CAGCATGGGC 
CGCGATCATC 
GCTGGTCGAG 
GAGCGTGGCG 
CTCGGAGGTG 
CGTCGCCAGC 
GGCGATCCGG 
CGCGGGTCCG 
CGCTGCGGCG 
GCACCCGATC 
TGCGGTGGGC 
GACGCTGTGG 
GCGGGTTCCG 
TGACGCCCGC 
GCCCGAGGTG 
GGCCGACAGG 
CTGCACCGTG 
TGCCAGTCGC 
CGATGCTGGG 
CCTTGGGCTG 
CCGCCGGGCA 
GGGCCTCGCG 
GGATCCTCAG 
CGCCGAGGAT 
CCCGCCGGAG 
GGGTGGGCTG 
ACATCTGGTG 
GCCGCCGGAG 
GGTGACCGTG 
CATCGAGCCC 
GGCGGAGACG 
GCTGCTGCAC 
CGCGGCGGTG 
CGGGCTCGCG 
ATGGGCCGAG 
CCTGCCCATG 
TGTCCAGCGT 
AGGGCGGCGC 
GGTGCCGACG 
CTACGAGCTC 
CGACGl’CGGC 


TCGGCCATCA 
GGCCCGCTCT 
ATCGAGACCC 
ACCGTGG TGG 
AACCTCGGCC 
CTRCATCACG 
ATCGAGGGGA 
ACGCGCTTCG 
GAGGAGGCGC 
GTCCTGTCGG 
CTGGAGAAGC 
AGCGCGATGG 
CTTTCGGCCG 
GGCAGCGCGC 
GGCCGAAAGC 
GCCATCGAGG 
TCGCAGCTCG 
TCTGCGCTGT 
GAGGTGGCCG 
TGCCGGCGCA 
CTGTCGCTGG 
CTGAGCAACA 
CTGGCGGCGC 
CATAGCCCGC 
CCGCGAGCGG 
GAGCTCGGTG 
GCGC AAGCGC 
CTGGTGCCGC 
TCGCTGCGGC 
GCGTCCGGCT 
CTGCCGACCT 
CGCCTCGCCG 
CCCCGCGCCG 
GGTGGGGTCG 
CTTCATGCGT 
CGAAACGACT 
GCATCGGCCG 
GTTCGATTCC 
TGCACGGTGG 
CGCGTCGTGG 
AAGAGCCCGA 
CAACTGGCGT 
GGCGACGTCG 
GGTGoCCTTG 
CTCACCAGCC 
GCCCGCGCGC 
GCAGCGGTGG 
CCGTTGCGCG 
GACGAGGCCC 
CGGCTGCTGC 
TGGGGTGGCA 
CACCATCGCC 
GGAGGCGTGG 
GCCACGGGGC 
TCGGTCACAC 
AACTTGCTTT 
GCAAACCGGA 
GTTCGCGGCA 
CGAGGCTTCG 


ATCAGGACGG 
TGCGCGAGGC 
ACGGGGCGGC 
GGCCGGCGCG 
ACCTGGAGGG 
AGCGCATCCC 
CCGCGCTCGC 
CGGGAGTGAG 
CGGCGGTGGA 
CGAAGAGCGT 
ATGTCGAGCT 
AGCACCGGCT 
CAGCGCAGGG 
CGAAGGTGGT 
TCATGGCCGA 
CGGAAGCGGG 
GGCGCATCGA 
GGCGGTCGTG 
CGGCGCACGT 
GCCGGCTGCT 
AGGAGGCCGA 
GCCCGCGCTC 
TGACGGCCAA 
AGCTCGACCC 
CTGCGGTGCC 
CGAGCTACTG 
TGCTGGAAGG 
CCCTGGACGA 
GAGGGCAGGA 
ATCCGGTGAG 
ATCCCTGGCA 
CAGCCGACCC 
CCCCGAAATC 
GCGAGGCGGT 
CGGCTGACGC 
GGCAGGGAGT 
ACGAAGTCAG 
TGAGCGCTGC 
GCGGCGASCC 
CGCTGGAGCA 
CGGAGATCGA 
TCCGCAGCGG 
CACCGATATC 
GTCTGCTCGT 
GGCACGGGCT 
GCATCGCAGC 
ATGTCGCCGA 
GGGTGGTGCA 
TGCTGGAGTC 
GCGACCGGCC 
AAGGCCAAGG 
GCGCGCACTC 
TTGATGCAAA 
CGGCCTTGTC 
GGATGGACTG 
CGGCTCTGGT 
TCTGGCGCGG 
TCGTCGCCCG 
CCGAGCAGGG 


CCGGTCGACG 
GCTGCGGAAC 
GACCTCGCTG 
AGCCGACGGA 
CGCTGCCGGC 
GAGGAACCTC 
GTTGGCGACC 
CTCGTTCGGG 
GCCTGAGGCC 
GGCGGCGCTG 
TGGCCTCGCC 
G3CGGTGGCC 
GCATACGCCG 
CTTCGTGTTT 
AGAGCCGGTC 
CTGGTCGCTG 
CGTGGTTCAG 
GGGAGTGGAG 
GGCCGGCGCG 
GCGGCGGATC 
GGCGGCGCTG 
GACCGTGCTC 
GGGGGTGTTC 
GCTGCGCGAA 
GATGCGCTCG 
GGCGGACAAT 
TGGCCCCACG 
GATCCAGACG 
CGAGCGCGCG 
CTGGGCTCGG 
GCACGAGCGG 
CACCAAGGAC 
GGAGACAGCT 
CGCTGCAGCG 
CTCCACCGTC 
CCTCTACCTG 
CGAGGCTACC 
GCCCCATCCT 
ACAGGTCTCT 
TCCCGCTGCC 
GCCCCTGGTG 
TCGCCGGCAC 
GCTGTCCGCG 
GGCTCGGTGG 
GCCAGAGCGA 
GGTCGAGGGG 
GGCCGATCCC 
CGCCGCCGC-C 
GGTGCTCCGT 
TCTCGACCTG 
CGCATACGCC 
CCTGCCGGCG 
GGCTCATGCA 
GGCGCTGGAG 
G3CGCGCTTC 
CGCGCAGGAC 
CCTGTCCGTT 
GGTGCTGGGC 
GCTCGACTCC 
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1 CTGATGGCTC 
1 ACTCTGGCCT 
1 CTGAAGCTGG 
1 ATCGCCATCG 
1 TGGCGGCATC 
1 GCGGCGGACT 
1 GCCTTCCTCC 
1 GCGATGAGCC 
1 CGCGCTGGCC 
1 ATCGGGAGCG 
L GGCACCACCG 
L CACGGCCCGA 
L GCCTGCCAGA 
L CTTTTGTCGC 
L CGGTGCAAGA 
L GTGGTGCTCA 
L CGGAGCACGG 
L GCCCAGCAGG 
l GATTTCGTGG 
L CTGGGCGCGG 
. AAGGCCAACC 
. TTGGCGCTGG 
. ATCCCGTGGG 
. GCGCGCCCGC 
. GTGTTGGAGG 
CTGTTCG'i'CC 
GATCATCTGG 
ACGCGCAGCG 
GGGGCGCTTT 
TCCGGCGGCA 
GGCATGGGCC 
GACCGGGCCA 
GCCGCCTCGC 
GCGCTTTCAG 
ATGGGCGAGG 
ATCATC7GCC 
GTCGAGCTGT 
GTGGCGGTGA i 
GAGGTGCTGG < 
GCCAGCCATA < 
ATCCGGCCGC I 
GGTCCGGAGC ' 
GCGGCGGCGC j 
CCGATCCTGG ' 
GTGGGCTCGC ' 
CTGTGGGCGT ( 
GTTCCGCTGC < 
GGGTCGAAGC ( 
CTCGGGGCTC C 
AGCGACGAGA C 
AGCGCGGCGT ; 
CTCGTGCTGG } 
ATCGTGCAAG 1 
CGTGASGAGG C 
AGCTCAGCAG 1 
GTCCTGTCGT C 
TCCTTCCAGG C 
TTGCCAGAAG l 
GCATGTTTTC / 


C TGGAGATCCG 
r TCGACCACCC 
G AGGACCGGAG 
3 TCGGTGCCGC 

2 TGGCCGAGGG 
r GGTACGACCC 
: GCGATGTGCG 
: TGGACCCGCA 
: AGGACCCGAT 

3 AGCACGCCGA 
3 GCAACCTGCT 
V CGATGACGGT 
\ GCCTGCGATT 
; CGCGGTCATT 
i CGTTCTCGGC 
i AGCGGCTCCG 
5 CGATCAACCA 
J CGTTGCTAGG 
J AGTCCCACGG 

I TGTATGGCCG 
: TCGGCCACCT 
\ AGCACGAGCA 
i CAGAGCTGCC 
: GTCGTGCAGG 
; AGGCGCCGGC 
' TGTCGGCGAA 
AGAAGCATG? 
CGATGGAGCA 
CGGCCGCAGC 
GCGCGCCGAA 
GAAAGCTCAT 
TCGAGGCGGA 
AGCTCGGGCG 
CGCTGTGGCG 
TTGC3GCGGC 
GGCGCAGCCG 
CGCTGGAGGA i 
GCAACAGCCC i 
CGGCGCTGAC i 
GCCCGCAGGT < 
GAGCGGCTGC I 
TCGGTGCGAG i 
AAGCGCTGCT < 
TGCCGCCTCT ( 
TGCGGCGAGG < 
CCGGCTATCC ( 
CGACCTATCC < 
CCTCGCTGCG ( 
CATTGCTCGT < 
GGCTATCCTA 1 
ATGTAGAGAT ( 
AGCAGCTGGC ( 
TGGCCCTCAG C 
CAGGTAGAAG < 
TGGGAGCGTT C 
CGGAGGCGCT C 
GTGTG3AGCA G 
ACATGGCATC C 
AAGTGCTGAC C 


3 TAACCGCCTT 
: GACGGTGGAG 
3 CGACACCCGG 
: CTGCCGGTTC 
3 CATGGTGGTC 
: CGATCCGGAG 
3 CAGCTTGGAT 
V ACAGCGGC’l'G 
? GGCGCTGCGC 
^ GCGGGTGCAG 
’ CAGCGTCGCC 
’ GGACACCGCG 
’ GGGCGAGTGC 
’ CGTCGCGGCA 
: CGCTGCAGAC 
I TGACGCGCAG 
i CGATGGCCCG 
; CCAC-GCGCTG 
1 GACGGGGACA 
; GGGCCGCCCC 
GGAGGCCGCG 
. GATTCCGGCT 
AGT3GCCGTT 
CCTGAGCGCT 
GGTGGAGCCT 
GAGCGTGGCG 
CGAGCTTGGC 
CCGGCTGGCG 
GCAGGGGCAT 
GGTGGTCTTC 
GGCCGAAGAG 
AGCGGGCTGG 
CATCGACGTG 
GTCGTGG3GA 
GCACGTGGCC 
GCTGCTGCGG 
GGCCGAGGCG 
GCGCTCGACC \ 
GGCCAAGGGG * 
CGACCCGCTG I 
GGTGCCGATG < 
CTACTGGGCG < 
GGAAGGTGGC < 
GGACGAGATC < 
GCAGGACGAG ( 
GGTGAGCTGG ( 
CTGGCAGCAC ( 
GCTTCGGCAG ( 
CTCGCCCCGA < 
TCTTTCGGAA C 
GGCGCTCGCC C 
GCTCGAGCGA C 
CGAAGAAGGG ( 
CTGGGTTCGG C 
GAAGGAAGCT C 
CTATCCGCTG C 
GGTGTGGCTC C 
CTCAAGTGGC C 
CGCGCTGCTC P, 


r CAGCGCGAGC 
3 CGGCTGGTGG 
3 CACATCCGGT 

2 CCGGGCGGGG 
: AGCACCGAGG 

3 GTTCCGGGCC 
P GCGGCGTTCT 
3 TTGCTGGAGG 
: GAGAGCGCCA 
3 GGCCTCGACG 
: GCTGGACGGC 
3 TGCTCGTCGT 

GACCAG3CAC 
TCGCGCATGC 
: GGCTTTGCGC 
; CGCCACCGCG 
I AGCAGCGGGC 
f GCGCAAGCGG 
i GCGCTGGGTG 
: GCGGAGCGGC 
! GCGGGCTTGG 
‘ CAACCGGAGC 
GTCCGCGCGG 
TTCGGCCTGA 
GAG3CCGCGG 
GCGCTGGATG 
CTCGGCGATG 
GTGGCCGCGA 
ACGCCGCCGG 
GTGTTTCCCG 
CCGGTCTTCC 
TCGCTGCTCG 
GTTCAGCCGG 
GTGGAGCCGG 
GGCGCGCTGT 
CGGATCAGCG 
3CGCTGCGTG 
GTGCTCGCAG 
CTCTTCTGGC 
CGCGAAGAGC 
CGCrCGACGG 
GACAATCTTC < 
CCCACGCTGT 
CACACGGCGG ' 
CGCGCGACGC ‘ 
GCTCGGCTGT ' 
GAG CGG TACT < 
CTTCATAACG ( 
CCCGGAGCTC J 
CATAGGGTCC J 
GCCGGCGTAG J 
GCCCTCGCCG : 
CCCGGTCGGG < 
CACGCCACGG ( 
CCGTGGGAGA 1 
CTCAACGACC ; 
GGCACGGGGG / 
GCC7ATCGGA *1 
ACCACGCCGG / 


: TGGGCGAACG 
3 CGCATCTCCT 
r CGGTGGCGGC 
3 ATGAGGGCCT 
3 TGCCAGCCGA 
3 GGACCTATCT 
r TCTCCATCTC 
3 TGAGCTGGGA 
V CGGGCGTGTT 
3 ACGACGCGGC 
; TGTCCTTCTT 
’ CGCTGGTGGC 
: TGGCCGGCGG 
: GTTTGCTT7C 
: GGGCCGAGGG 
i ACCCCATCCT 
: TCACGGTGCC 
; GCGTGGCACC 
i ACCCGATCGA 
: CGCTCTGGCT 
1 CCGGCGTGCT 
TCGACGAGCT 
CGGTCCCCTG 
, GCGGGACCAA 
CCCCCGAGCG 
CCCAGGCAGC 
TGGCGTTCAG 
GCTCGCGCGA 
GAGCCGTGCG 
GCCAGGGCTC 
GGGCGGCGCT 
GGGAGCTCTC 
TGCTCTTCGC 
AAGCGGTGGT 
CGCTCGAGGA 
GTCAGGGCGA 
GCCATGAGGG 
GCGAGCCGGC i 
GGCAGGTGAA i 
TGGTCGCGGC : 
TGACGGGCGG i 
GGCAGCCGGT \ 
TCATCGAGAT ( 
TCGAGCAAGG ( 
TGCTGGAGGC < 
TCCCCGCGGG ( 
GGATCGAGGA ( 
GCGCCACGGA ( 
ACTTGTGGGA ( 
ATGGCGAAGC ( 
ATCTCTATGG ( 
TGCCTTCCGA l 
CCTCATTCCA C 
GGCACGTGTG 1 
TTCAACAGCG / 
ACGCCCTCGA C 
AGGTGCTCGG C 
TTCATCCCGC C 
AATCCATCGA G 


3 GCTGTCGGCG 
r CACCGACGTG 
: GGATGACGAC 
r GGAGACATAC 
\ CCGGTGGCGC 
P GGCCAAGGGG 
I CCCTCGTGAG 
^ GGCGATCGAG 
P CGTGGGCATG 
: GTTGCTGTAC 
f CCTGGGTCTG 
: GTTGCACCTC 
; GTCCACCCTG 
: GCCAGATGGG 
I CTGCGCCGTG 
' GGCGGTGGTC 
: CAGCGGTCCT 
: GGCCGAGGTC 
l GGTGCAGC-CG 
1 GGGCGCTGTC 
CAAGGTGCTC 
CAACCCGCAC 
GCCCCGCGGC 
, CGC3CATGTG 
CGCTGCGGAG 
CCGGCTGCGG 
CCTGGCGACG 
GGCGCTGCGA 
TGGGCGGGCC 
GCAGTGGGTG 
GGAGGGTrGC 
CGCCGACGAG 
CGTGGAAGTA 
GGGCCACAGC 
TGCGGTGGCG 
GATGGCGCTG 
TCGGCTGAGC 
GGCGCTCTCG 
GGTGGACGTC 
3CTGGGAGCG 
GGTGATTGCG 
GCGCTTCGCT 
GAGCCCGCAC 
GGGCGCTGCG 
GCTGGGGACG 
CGGCAGGCGG 
CAGCGTGCAT 
CCATCCGCTG 
GCAAGCGCTG 
CGTGTTGCCC 
CGCGGCGACG 
AGGCGGACGC 
GGTATCGAGC 
TAGCGACCAG 
ATGTCCGAGC 
CTATGGCCCC 
CCGGGTACGC 
CTTGTTGGAT 
GATTCGGAGG 
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45841 CGGCTGACGG ATCTCCACGA ACCGGATCTC CCGCGGTCCA GGGCTCCGG7 GAATCAAGCG 
45901 GTGAGTGACA CCTGGCTGTG GGACGCCGCG C7GGACGGTG GACGGCGCCA GAGCGCGAGC 
45961 GTGCCCGTCG ACCTGGTGCT CGGCAGCTTC CACGCGAAG7 GGGAGGTCAT GGATCGCCTC 
46021 GCGCAGACGT ACATCATCCG CACTCTCCGC ACATGGAACG TCTTCTGCGC IGCTGGAGAG 
5 46081 CGTCACACGA TAGACGAGTT GCTCGTCAGG CTCCAAATCT CTGCTGTCTA CAGGAAGGTC 

46141 ATCAAGCGAT GGATGGATCA CCTTGTCGCG ATCGGCGTCC TTGTAGGGGA CGGAGAGCAT 
46201 CTTGTGAGCT CTCAGCCGCT GCCGGAGCAT GATTGGGCGG CGGTGCTCGA GGAGGCCGCG 
46261 ACGGTGTTCG CGGACCTCCC AGTCCTACTT GAGT3GTGCA AGTTTGCCGG GGAACGGCTC 
46321 GCGGACGTGT TGACCGGGAA GACGCTGGCG CTCGAGATCC TC7TCCCTGG CGGCTCGTTC 
10 46331 GATATGGCGG AGCGAATCTA TCAAGATTCG CCCATCGCCC GTTACTCCAA CGGCATCGTG 

46441 CGCGGTGTCG TCGAGTCGGC GGCGCGGGTG GTAGCACCGT CGGGAACGTT CAGCATCTTG 
46501 GAGATCGGAG CAGGGACGGG CGCGACCACC GCCGCCGTCC TCCCGGTGTT GCTGCCTGAC 
46561 CGGACAGAAT ACCATTTCAC CGATGTTTCT CCGCTCTTCC TTGCTCGTGC GGAGCAAAGA 
46621 7TTCGAGATC A7CCArTCCT GAAGTATGGT ATTCTGGATA 7CGACCAGGA GCCAGCTGGC 
15 46681 CAGGGATACG CACATCAGAA GTTCGACG7C A7CGTCGCGG CCAACGTCA7 CCATGCGACC 

46741 CGCGATA7AA GAGCCACCGC GAAGCGTCTC CTG7CGTTGC TCGCGCCCGG AGGCCTTCTG 
46801 GTGCTGG7CG AGGGCACAGG GCATCCGATC TGG77CGA7A TCACCACGGG A7TGATCGAG 
46861 GGGTGGCAGA AGTACGAAGA 7GATCTTCGT ACCGACCA7C CGC7CCTGCC TGCTCGGACC 
46921 TGG7GTGACG TCCTGCGCCG GGTAGGCTTT GCGGATGCCG 7GAGTCTGCC AGGCGACGGA 
20 46981 TCTCCGGCGG GGATCC7CGG ACAGCACGTG A7CC7C7CGC GCCC7CCGGG CATAGCAGGA 

47041 GCCGCTTGTG ACAGCTCCGG TGAGTCGGCG ACCGAATCGC CGGCCGCGCG TGCAGTACGG 
47101 CAGGAATGGG CCGA7GGCTC CGGTGACGGC GTCCATCGGA TGGCGTTGGA GAGAATG7AC 
47161 TTCCACCGCC GGCCGGGCCG GCAGG7TTGG GTCCACGG7C GAT7GCG7AC CGGTGGAGGC 
47221 GCGTTCACGA AGGCGC7CAC TGGAGATCTG CTCCTGTTCG AAGAGACCGG GCAGGTCGTG 
47281 GCAGAGGTTC AGGGGCTCCG CCTGCCGCAG CTCGAGGCT7 C7GCTTTCGC GCCGCGGGAC 
47341 CCGCGGGAAG AG7GGTTGTA CGCC77GGAA TGGCAGCGCA AAGACCCTAT ACCAGAGGCT 
47401 CCGGCAGCCG CGTCTTCT7C CACCGCGGGG GCTTGGCTCG TGCTGATGGA CCAGGGCGGG 
47461 ACAGGCGCTG CGCTCGTATC GCTGC7GGAA GGGCGAGGCG AGGCGTGCG7 GCGCGTCGTC 
47521 GCGGGTACGG CA7ACGCCTG CCTCGCGCCG GGGC7GTATC AAGTCGATCC GGCGCAGCCA 
30 47581 GATGGCTTTC ATACCCTGCT CCGCGATGCA TTCGGCGAGG ACCGGATGTG CCGCGCGGTA 

47641 GTGCATATGT GGAGCCTTGA TGCGAAGGCA GCAGGGGAGA GGACGACAGC GGAGTCGCT7 
47701 CAGSCCGATC AACTCC7GGG GAGCCTGAGC GCGC7T7CTC TGGTGCAGGC GCTGG7GCGC 
47761 CGGAGGTGGC GCAACATC-CC GCGACTTTGG CTCTTGACCC GCGCCGTGCA TGCGG7GGGC 
47821 GCG5AGGACG CAGCGGCCTC GGTGGCGCAG GCGCCGG7GT GGGGCCTCGG TCGGACGC7C 
35 47881 GCGCTCGAGC ATCCAGAGCT GCGGTGCACG CTCGTGGACG 7GAACCCGGC GCCGTC7CCA 

47941 GAGGACGCAG C7GCACTCGC GG7GGAGCTC GGGGCGAGCG ACAGAGAGGA CCAGATCGCA 
48001 T7GCGCTCGA ATGGCCGCTA CG7GGCGCGC C7CC-TGCGGA GCTCCTTTTC CGGCAAGCCT 
4 8061 GCTACGGA77 GCGGCATCCG GGCGGACC-GC AG7TA7GTGA TCACCGATGG CA7GGGGAGA 
48121 GTGGGGCTCT CGGTCCCGCA ATGGATGGTG ATGCAGGGGG CCCGCCA7GT GGTGCTCGTG 
40 46161 GA7CGCGGCG GCGCTTCCGA CGCCTGCCGG GATGCCCTCC GG7CCATGGC CGAGCC7GGC 

48241 GCAGAGGTGC AGATCGTGGA GGCCGACGTG GCTCGGCGCG TCGATGTCGC TCGGCTTCTC 
48301 TCGAAGATCG AACCGTCGAT GCCGCCGCT7 CGGGGGATCG TGTACGTGGA CGGGACC7TC 
48361 CAGGGCGACT CC7CGA7GCT GGAGCTGGA7 GCCCA7CGCT TCAAGGAG7G GATGTATCCC 
48421 AAGGTGCTCG GAGCGTGCAA CC7GCACGCG CTGACCAGGG ATAGA7CGCT GGACTTCTTC 
45 40481 GTCCTGTACT CC7CGGGCAC CTCGCTTCTG GGCTTGCCCG GACAGGGGAG CCGCGCCGCC 

4 8541 GGTGACGCCT 7C77GGACGC CATCGCGCAT CACCGG7GTA GGC7GGGCC7 CACAGCGATG 
48601 AGCA7CAAC7 GGGGATTGC7 CTCCGAAGCA TCATCGCCGG CGACCCCGAA CGACGGCGGC 
48661 GCACGGC7CC AATACCGGC-G GATGGAAGGT CTCACGCTGG AGCAGGGAGC GGAGGCGCTC 
. 48721 GGGCGCTTGC TCGCACAACC CAGGGCGCAG GTAGGGGTAA TGCGGCTGAA TCTGCGCCAG 

50 48781 7GGCTGGAGT TCTATCCCAA CGCGGCCCGA CTGGCGC7GT GGGCGGAG7T GCTGAAGGAG 

48841 CGTGACCGCA CCGACCGGAG CGCGTCGAAC GCA7CGAACC TGCGCGAGGC GCTGCAGAGC 
. 48901 GCCAGGCCCG AAGA7CGTCA GTTGGT7CTG GAGAAGCACT TGAGCGAGC7 GT7GGGGCGG 

46961 GGGCTGCGCC TTCCGCCGGA GAGGATCGAG CGGCACGTGC CG77CAGCAA TCTCGGCATG 
49021 GACTCGTTGA 7AGGCCTGGA GCTCCGCAAC CGCATCGAGG CCGCGCTCGG CATCACCG7G 
55 49031 CCGGCGACCC 7GC7ATGGAC TTACCCTACC GTAG7AGCTC 7GAGCGGGAA CCTGCTAGAT 

49141 ATTCTGTTCC CGAA7GCCGG CGCGAC7CAC GCTCCGGCCA CCGAGCGGGA GAAGAGCTTC 
49201 GAGAACGATG CCGCAGATCT CGAGGCTCTG CGGGG7ATGA CGGACGAGCA GAAGGACGCG 
49261 TTGCTCGCCG AAAAGCTGGC GCAGCTCGCG CAGATCGT7G GTGAGTAAGG GACTGAGGGA 
49321 G7A7GGCGAC CACGAATGCC GGGAAGCT7G AGCATGCCCT 7CTGC7CATG GACAAGCTTG 
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CGAAAAAGAA 
GCTGCCGCTT 
GCCGAGACGC 
AGGAGGTGCC 
TCT7TGGCAC 
AGGTCACCTG 
GCACCGGGGT 
GGCGCGAGGA 
GGTTGTCTTA 
CGTCGCTCGT 
CGCTGGCGGG 
TCCAGGCGCT 
TCCGTGGGGA 
GCGATCGGAT 
GGTTGATGGC 
CTCGCGTCGA 
GCGACCCGAT 
GCCGCTGCGT 
TGCCCGGTTT 
ATTTTCACAC 
AGCCGGTGCC 
TCAGCGGCAC 
CGACGCCGGG 
ACGCACACCC 
ACGTCGCGTT 
CGACCTCGCG 
CAGGCGCGGC 
GGCAG3GCGC 
GCGAGACCTT 
AGGTGATGTG 
CCGAGCCGGC 
TGGAGCCGGA 
GTG7GTTCTC 
CGCTGCCGGC 
CGGTGGCGCC 
TGATCGCGGG 
CGCGAACCAA 
TGGAGGCGTT 
TGAGCAACCT 
GTCACGCGCG 
CGGGCATCTT 
TGCCGGATGC 
GCGCGCTGGA 
TCTTCCCTTC 
ACTGGATCGA 
ACCACCCCGT 
AGACGACGCT 
TCGTGTTTCC 
GCGATGGACC 
ATACGGCGGT 
AGGTAGCGAG 
TGCTGCGCCG 
CCCGGCTTCA 
TTCAATACGG 
TGGGCAGAGT 
TGCTGCTGGA 
CGCCGTGGGC 
TATGGTGCCA 
ACTTTGACTT 


CGCG7CTTTG 
CCCCGGCGGA 
GGTCCAGCCG 
GCGCTGGGCC 
CTCGCCTCGG 
GGAAGGGCTC 
ATTCCTGGGC 
GCAGGACGCG 
TACGCTAGGG 
GGCCATCCAC 
GGGCGTCAAC 
GTCGCCCGAT 
GGGCTGCGG? 
CTGGGCTCTG 
ACCCAATGTG 
CGCCGCCGCC 
CGAGGTCGAT 
GCTGGGCGCA 
GATCAAGGCG 
GCTCAATCCG 
GTGGCCGCGG 
CAACGTCCAT 
GCGCTCAGCA 
GGCGCGGCTC 
CAGCCTGGIA 
CGAGGCGCTG 
GCGCGGCAGG 
GCAGGTGCCC- 
CGACCGG7GC 
GGCCGAGCCG 
GCTCTTTGCG 
GCTCATCGCT 
CCTCGAGGAC 
CGGCGGTGCG 
GCACGCAGCG 
CGCCGAGAAA 
ACCGCTGCAT 
CCGGCGG GTG 
GAGCCCGAAG 
AGAGGCGG7G 
CGTCGAGGTG 
CAGGCCGGTG 
GGCGCTGGGT 
GGGCGGACGG 
AGCGCCGGTC 
TCTGGGTGAA 
GGACCGAAAG 
TGGCGCCGGG 
GATCCAGGTC 
ACCGGTCCAG 
TCGGGAGCCG 
GGTCGGGCGC 
TGCCGCCGTG 
CCCGGCGTTG 
GAGACTGCCT 
CGCGTGCGTC 
GCCGGTGGAG 
TGCGCGCGTC 
GATGGACGGT 


gagcaagagc 

GCGGACACTC 
CTCGACCGGC 
GGACTGCTCA 
GAGGCGCGGT 
CAGGACGCCG 
GCATGCAGCA 
TACGACATCA 
CTGCAGGGAC 
CTTGCCTGCC 
ATGCTCCTTT 
GGCCACTGCC 
ATGGTCGTGC 
ATCCGGGGTT 
CTCGCTCAGG 
ATCGATTATG 
GCGCTGCGTG 
GTGAAGACCA 
GCGCTGGCTC 
CGGATCCGGA 
GCGGGCCGAC 
GTCGTGCTGG 
GAGCTTTTCC 
TCAGCGCACA 
GCGACGCGGA 
CGAAGCGCGC 
GCCGCTTCCT 
GGCATGGGCC 
GTCACGC7CT 
GGCAGCAGCA 
CTGGAGTACG 
GGCCATAGCC 
GCCGTGCGCT 
ATGG7ATCGA 
TC3G7G7CGA 
TTCG7GCAGC 
GTTTCGCACG 
ACCGAGTCGG 
CCC7GCACGG 
CGCTTCGCGG 
GGCCCGAAGC 
CTGCTCCCAG 
GGGTTCTGGG 
CGGG7ACCGC 
GATGG7GAGG 
GCCTTTTCCG 
CGGCTGCCG7 
TACCTGGAGA 
ACGGATGTGG 
GTGGTGACGA 
GGGGCACGTC 
GCCGAGACCC 
CCCGCTGCGG 
CGGGGGCTCG 
GAGTCCGCCG 
CAAATGATTG 
G7GGGCTCGG 
GTGAGCGATG 
ACGGGCGC3G 


GGACCGAGCC 
CGGAGGCATT 
CCTGGGCGCT 
CCGAGGCGGT 
CGCTCGATCC 
GCATCGCACC 
GCGACTACTC 
CCGGCAATAC 
CCTGCCTGAC 
GCAGCCTGCG 
CGTCCAAGAC 
GGACATTCGA 
TCAAACGGCT 
CGGCCATGAA 
AGGCGCTCTT 
TCGAGACCCA 
CCGTGATGGG 
ACCTCGGCCA 
TGCACCACGA 
TCGA3CGGAC 
CGCGCT7CGC 
AGGAGGCGCC 
TGCTGTCGGC 
TCGCCGCGTA 
GCCCGATGGA 
T3GAAGCTGC 
CGCCCGGCAA 
GTGGGTTGTG 
7CGACCGGGA 
GGTCGTCGTT 
CGCTGGCCGC 
7CGGCGAGCT 
TCGTGGTCGC 
7CGCCGCGC3 
TCGCGGCAGT 
AGATCGCGGC 
CGTTCCACTC 
TGACGTATCG 
ATGAGGTGTG 
ACGGCGTGAA 
CGGCGCTGCT 
CGTCGCGCGC 
TCGTCCGTGG 
TGCCAACCTA 
CGGACGGCAT 
TGTCGACCCA 
GGCTCGGCGA 
TGGCGCTGTC 
TGCTCATCGA 
CCGAGGAGCG 
GCGCGTCC7T 
CGGCGAGGTT 
CTATCTATGG 
CCGAGC7GTG 
GCTCCGCGAC 
rTGGCGCGT7 
TGCGGCTGTT 
GTCAACAGGC 
TGG7CGCCGA 


GATCGGCATC 
CTGGGAGCTG 
GGTCGGCG7C 
GGACGGCT7C 
TCAGCAACGC 
CCAG7CCCTC 
GCATACCGTT 
GCTCAGCGTC 
CGTCGACACG 
CGCTCGCGAG 
GATGATAA7G 
CGCCTCGGCC 
CTCCGACGCC 
TCAGGATGGC 
ACGCCAGGCG 
CGGAACGGGG 
GCCGGCGCGG 
CCTGGAGGGC 
ATCGATCCCG 
CGCGCTCGCG 
GGGGGTGAGC 
GGCCACGGTG 
GAAGAGCACC 
CCCGGAGCAG 
GCACCGGCTC 
GGCGCAGGGG 
GC7CGCCTTC 
GGAGGCGTGG 
GCTCCATCAG 
GCTGGACCAG 
GC7CTTCCGG 
GGTGGCCGCC 
GCGCGGCCG3 
GGAGGCCGAC 
CAA7GGGCCG 
GGCGTTCGCG 
GCCGCTCATG 
GCGGCCTTCG 
CGCGCCGGGT 
GGCGCTGCAC 
CGGCCTT7TG 
CGGGCGTGAC 
ATCGGTCACC 
TCCCTGGCAG 
CGGCCGTGCT 
TGCCGGTCTG 
GCACCGGGCG 
GTCGGGGGCC 
GACGCTGACC 
ACCGGGACGG 
CCGGATCCAC 
GAACCTCGCC 
GGCGCTCGCC 
GCGGGGTGAG 
AGCCTACCAG 
CGCCGA7CGC 
CCAGCGGTCT 
CCCCAGCCGG 
GATCTCCCGG 


ATAGGTATTG 
CTCGACTCGG 
CATCCCAGCG 
GACGCCGCCT 
CTGCTGCTGG 
GACGGCAGCC 
GCGCAACAGC 
GCCGCCGGAC 
GCCTGCTCGT 
AGCGA7CTCG 
CTGGGGCGCA 
AACGGGTTCG 
CAGCGACA7G 
CGGTCCACAG 
CTGCAGAGCG 
ACC7CGCTCG 
GCCGATGGGA 
GCTGCAGGCG 
CGAAACC7CC 
CTGGCGACGG 
GCGT7CGGCC 
CTCGCACCGG 
GCCGCGCTGG 
GGCCTCGGAG 
GCGGTC-GCGG 
CAGACCCCGG 
CTG7TCGCCG 
CCGGCGTTCC 
CCGCTCTGCG 
ACGGCA7TCA 
TCGTGGGGCG 
7GCGTGGCGG 
TTGATGCAGG 
GTGGC7GCC3 
GAGCAGGTGG 
GCGCGGGGGG 
GATCCGATGC 
ATGGCGCTGG 
TACTGGGTGC 
GCGGCC3G7G 
CCGGCCTGCC 
GAGGCTGCGA 
TGG7CGGGTG 
CGCGAGCGTT 
CAGGCGGGGG 
CGCC7GTGGG 
CAGCGGGAGG 
GAGATCTTGG 
TTCGCGGGCG 
CTGCGGTTCC 
GCCCGCGGCG 
GCCCTGCGCG 
GAGATGGGGC 
GGCGAGGCGC 
CTGCATCCGG 
GATGAGGCGA 
CCTGGGGAGC 
TGGAGCGCCG 
CTGGTGGTGG 
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AGCGGCTTGC 
GGGAGCCCGC 
AGGGTGGTGG 
TCCACGCCGC 
TCGACGGCCA 
TCGACCCGGG 
CCGATGCCCT 
CGCTGGTCGG 
AGGCGGCCGC 
CCATCGCCTT 
CTGAAGGGGA 
TCGCCCTGCG 
AACGCCGGGA 
CCGGCGTGCT 
AGGTCGAGAT 
GCGTTGCTCC 
GCGCCGGGC3 
TGATCGCCCT 
TGCCTCGGCC 
CG3CCTGGTA 
GTGCGGAGGC 
AGGTGTATGC 
GGTACGTGAG 
GCGAGGGTGT 
TGGTCCTGCG 
CGCAGCCTGG 
GAATGATGCT 
TCGCAGCCGG 
CACCGCCGGT 
AAGGACAGCA 
CTCCGGCCGA 
TGGGTGGGCT 
TGCTGGTGGG 
AGGCCCACGG 
AGCGGGTCCT 
CAGGTCTTGT 
TGGGACCTAA 
CCTTCTTCGT 
ATGCCGCAGC 
CGGCGCTGAG 
ACCGTGGCGC 
CAGCTCTGGC 
CGCGGCAGTG 
TGACCACGCA 
TTCCCTCGGC 
TCTCGCATGT 
TGGGCATGGA 
TCGCCGCGCC 
TGCTCGACGA 
CAAGCGCCGG 
GTCTCTTCTG 
AGTCTGAGTG 
AGGACGCGCC 
ACGCACCGTT 
TGGAGCTCGC 
GCrTGATCTC 
TCTTCCGAAA 
CAGACAAGGT 
CCTCGAAGAT 


GAGCGGTGTA 
GGCGCTCGAG 
GCTCGGGCGC 
GGGGGACGAC 
GGCCCCGACG 
GCTCGGGGCG 
CGAGTCGGCG 
CATGGACCTC 
CGCCGGCGAT 
GGAGCACGCC 
AGCCGATGCT 
CGGTGGCGAG 
GAAGATCGCG 
GGACCAACTG 
CGCCGTCGAA 
CAATGACCTG 
CATCGTCGCT 
TGCGGCGGGA 
TCTGGGGCTC 
CGCCCTCGAC 
CGGTGGTATC 
GACCGCCGAC 
CGAT7CCCGC 
GGACGTCGTG 
CGCCTGTGGC 
GCTGCCGCCG 
CGATCAACCG 
TGCCATCAGC 
CGAGACCTTC 
TCTCGGGAAG 
ATCCAGCGTC 
CGGTCTGCGC 
CCGCTCCGGT 
CGCGCGCGTC 
CCGCGAGGTT 
GGATGACGGG 
GGTCCAGGGA 
GCTGTACGC7 
CAACGCGTTC 
CATCGACTGG 
GCGGCTGATC 
GCGCTTGCTC 
GGTGGAGTTC 
GCGCGCGGTT 
TGAGCCGAGC 
GCTGCGTCTC 
CTCGCTGATG 
TGCAGCCTTG 
CGCCCTCGCC 
ATCGTTCGTC 
TTTTCACGGT 
GAGCGATCTG 
TGGTAAGAAG 
TGCGTTAGTA 
TAGTCGTTCC 
TTCTTCAGAG 
TGCCGCGGGT 
CATCACAGAC 
CGCGGTCCCT 


CGCCGGCGCG 
GGGCCCAAGA 
TCGTTGTGCT 
ACCAGCGCTG 
GCCGTGGTGC 
CAGGGC GCGC 
CTGATGCGTG 
CGAAATGCGC 
GTCTCCGTGG 
GAGCIGCGCT 
TTGCTGGCCG 
CGGTTTGTTG 
CCCGCCGGTG 
GTGCTCCGGG 
GCGGCGGGGC 
CCTGGAGGAG 
GTGGGCGAGG 
GTA.TTTGCTA 
TCGGCGACCG 
AAGGTCGCCC 
GGTCTTTGCG 
ACGCCCGAGA 
TCGGGCCGGT 
CTCGACTCGC 
CGCCTTGTGA 
CTCCTACGGA 
GCGAGGATCC 
CCACTGGGGT 
CCGATCTCTC 
CTCGTGCTCA 
GCCGTCCGCG 
GTGGCCGGAT 
GCGGCGAGCG 
ACGGTGGCGA 
ACCGCGTCGG 
CTGCTGATGC 
GCCTTGCACT 
TCTGCAGCTG 
CTCGACGCCC 
GGCATGTTCA 
TCTCGCGGGA 
GAGGGTGATC 
TACCCGGCAA 
GCTGATCGGA 
GCGCGGGCGG 
CCTGAAGACA 
AGCCTGGAGC 
GC-GTGGACGT 
GTCCGGCTTG 
CACGTCCTCC 
TCTGGCGGCT 
GAAATCGTGG 
TACGTCCAAG 
GGGTTCAGCC 
GGCGCACCGG 
ATCACCCCGG 
TTCGTGCGAT 
ACCATGG7GG 
ATCGTCGCCA 


ACGCAGACGA 
TCACAGCCGG 
CAGCGCTGAA 
CAGGAATGCG 
ACCTCAGCAG 
TCGACGCGCC 
GT7GCGACAG 
CGCGGCTGTG 
TGCAAGCGCC 
GTATCAGCGT 
AGCTACTTGC 
CGCGGCTCGT 
ACAGGCCGTT 
CCACGGGGCG 
TCGACTCCAT 
AAATCGAGCC 
GCG7GAACGG 
CCCATGTCAC 
AGGCGGCCGC 
ACCTGCAGGC 
CGGTGCGA7G 
AACGTGCCTA 
7CGCCGCAGA 
TTTCGGGCGA 
AGCTGGGCAG 
ATTTTTCCTT 
GTGCGCTCCT 
CGGGGTTGCG 
GCGCAGCCGA 
CGCTGGACGA 
CGGACGGCAC 
GGCTGGCCGA 
CAGAGCAGCG 
AAGCGGATGT 
GGATGCCGCT 
AGCAGACTCC 
TGCACACGCT 
GGCTG7TCGG 
TTTCGCATCA 
CGGAGGTGGG 
TGCGGGGCAT 
GCGTGCAGAC 
CAGCGGCCTC 
CCGCCGGGGA 
GGCTGCTGCA 
AGATCGAGGT 
TGCGCAACCG 
ACCCAACGGT 
GCGGCGGGTC 
GCTTTCGTCC 
CGCCCGAGGG 
CCATGTGGCA 
AGGCGGCCTC 
TGGGTGTCCG 
CTCCGCTGGC 
AGA7GGAGAC 
CCACCCAACA 
CTCCGGCCCC 
TCGCCGGCTC 


CTGGT7CCTG 
CCGGTGGCTG 
GGCCCCCGGC 
CGCGCTCCTG 
CCTCGACGGG 
CCGGAGCCCA 
CGTGCTCTCC 
GCTTTTGACC 
GC7GTTGGGG 
CGACCTCGAT 
AGATGATGCC 
CCACCGGCTG 
CCCGCTAGAG 
GCGCGCTCCT 
CGACATCCAG 
GTCGGTGCrc 
CCTTGTGGTG 
CACGTCGGCC 
GATGCCCCTC 
GGGGGAGCCC 
GGCGCAGCGC 
CCTGGAGTCG 
CCTGCATGCA 
GCACATCCAC 
GCGCGACGAC 
CTCGCAGGTG 
CGACGAGCTG 
CGTTGGCGGA 
GGCATTCCGG 
CCCGGAGGTG 
CTACCTTG7G 
GCGGGGCGCG 
AGCCGCCCTG 
CGCCGATCGG 
GCGGGGTGTC 
GGCGCGGCTC 
GACACGCGAA 
CTCGCCAGGC 
CCGCAGGGCG 
GATGGCCGTT 
CACCCCCGAT 
GGGGGTGATA 
ACGGAGGTTG 
TCGGGACCTG 
GGACGTCGTG 
GGATGCCCCG 
CATCGAGGCT 
AGCAGCGATA 
GGACACGGAC 
TGTCGTCAAG 
CTTCCGTTCC 
CGATCGCAGC 
GCTCATTCAG 
GTTCGTCATG 
CGTTTTTGCG 
CGATATAATA 
AGTTCAGGCC 
CGGGGACTCG 
GGACGATGTG 


GAGCTGGATT 
CTGCTCGGCG 
CATGTCGTCG 
GCCAACGCGT 
GGCGGCCAGC 
GATGTCGATG 
CTGGTGCAAG 
CGCGGGGCTC 
CTGGGCCGCA 
CCACCCCAGC 
GAGGAGGAGG 
CCCGAGGCTC 
ATCGATGAAC 
GGTCCGGGCG 
CTGGCGGTGG 
GGAAGCGAGT 
GGCCAGCCGG 
ACGCTGG7GT 
GCG TATTTGA 
GTGCTGATCC 
GTGGGCGCCG 
CTGGGCGTGC 
TGGACGGACG 
AAGAGCCTCA 
TGCGCCGACA 
GACTTGCGGG 
TTCGGGTTGG 
TCCCTCACGC 
AGGATGGCGC 
CGGATCCGCG 
ACCGGCGG7C 
GGGCAACTGG 
GCGGCGCTAG 
TCACAGATCG 
GTGCATGCGG 
CGCACGGTGA 
GCGCCTCTTT 
CAGGGCAACT 
CACGGCCTGC 
GCGC AAGAAA 
GAGGGTCTGT 
CCGATCACTC 
TCGCGGCTGG 
CTCGAACAGC 
CGCGTGCAGG 
CTCTCGAGCA 
GCGCTGGGCG 
ACGCGCTGGC 
GAATCGACGG 
CCGCGGCCTC 
TGGTCGGA3A 
CTCGCCTCCG 
CACTATGCAG 
GGGACAGCCG 
TTGGGCGGCA 
GCCAAGCTCT 
GATGCTCGCC 
AAGGAGCCGC 
ATCGTGCCTC 
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56461 CAAGCGACGT 
56521 CCGGAGATCA 
56581 ATCTCAATCC 
56641 GATGGCAGCO 
56701 CGGCAGGCAG 
56761 CACAGGAGCA 
56821 CTGGGTACGC 
56881 TCTACTGGGA 
56941 TCCGCGACGA 
57001 CGGCCATTCC 
57061 ATCACGCTCG 
57121 TGCGCGCCGA 
57181 AGTTCGACGT 
57241 TGAAGGTTCC 
57301 CGCTCGGCCT 
57361 TCACCGAGGG 
57421 AAAATGACGT 
57481 AGGAGCTGGT 
57541 TTATCGCGTT 
57601 CCGAGCCCGC 
57661 GAATAGGAAC 
57721 AAGGGGAGAT i 
57781 GGCCAGACGT < 
57841 CCCATGTCTG i 
57901 TCTTCCGTAG i 
57961 CGTTCCGGAA ( 
58021 CGSGGGCATC ( 
58081 TCCCGCGGGT ( 
58141 CACGGCCCGG l 
58201 GCGCCGCCCT < 
58261 GGCCCTGCAC C 
53321 TTCTGCTCTC C 
58381 TCGCGCGGCG G 
58441 GCCCGTCCGT C 
58501 CGGTCGGGGG C 
58561 CGGGTATCGA G 
58621 CGCTCGGCGC G 
>8661 GCACGTTGAC G 
>8741 CGCCTGCTCG T 
>8801 GGCCCAGCTC G 
>8861 CGACGCGATC G 
18921 GCTGGAGAGC C 
8981 TTTCGCGCAC A 
'9041 TGCGGTCTGG T 1 
9101 GCTCCGCGAC G< 
9161 CAGCGTGTCC Gl 
9221 CCGCGTACCC G< 
9231 CATCCCGCTC TJ 
9341 GGCGGCGTGC CC 
9401 GCGGCTAGCG CC 
9461 TGTCGAACGG A1 
9521 TGGCGTCGAT GC 
9581 TCTTGTTCGA G( 
9641 GCGCCTCGCG Gl 
9701 GCGCCTCGCG T1 
9761 CGCACGCCGC CG 
9821 CAGCAATCTT TT 
9881 CGGCTCGTCG GT 
9941 CCGCAT GCGG GT 


GT TCAGGATCTA 
CA CGAGTTTCTC 
CC GCTGCTCGCC 
CC TCCCTCGGGC 
CGGAGGCTCA 

:a agcgaatcag 
c ggaggacccg 
a TGAAGGCCGC 
A ACGCTTCGCG 
C CGAGCTCAGC 
G GGTCCGCAAG 
A AATACAGCCC 
T TGTGCGGGAT 
C GGCCGAGTGT 
T GGGTTTGGTG 
G GCTCGCGCTG 
T CTTGACGATG 
T CGCGCTCGTG 
T CGCTGTGCTC 
C GCTCATGAGG 
C TGTGCGTTTC 
T GGTCTTTCTC i 
T GTTTGATGTG i 
3 CCCCGGGGTG ' 
3 GTTCCCCGAG , 
\ CATCGAATCA ( 
C GCTTCCCGAA < 
r GCGATTCGAT < 
9 AGAAGAGCCC C 
P GGGAGCGCAA / 
; CCGCACCGAG C 
: GCTCGTCGCC C 
; GCTGCGCCAG C 
' CGTCGGCGCG C 
; CGTGCTCTCC G 
> GGTCGAT3TG A 
: GATCGCGCCC C 
: GTGGGATCTC G 
TACACCTCGC C 
GAGCCGGACT C 
GCCGCGTACT C 
CTCGTGCGGC T 
ACGACGGCCG G' 
TTCGATGrCG O 
GCGGCCTCGG O 
GCTGCCGTAG G( 
GCGTCCGACC A( 
TACACCGCGT T( 
CGGTTGTTCC C' 
CGCGTCGAGG C i 
ATTGCCGCAG CC 
GCCGCCTGGG C/ 
GCACGCATCC T1 
GTCGCACCGC GC 
TTGCGCGCCA CC 
CGCGAGCAGG Cl 
TTGCATGGCT TC 
GTTCGACAGC CG 
GTTTCTCGCA GC 


TA CAATCTCGCA 
TC GTCGATCGAG 
CC GCGAGGACGA 
3C GCGCGAGATG 
CA TGAGCCTTCC 
AGTGAGACGA 

:g tttcccgcga 

3C TCCTGGGTCC 

:g gtcagtcgag 

;c GATATGAAGA 

id ctcgtcaacc 

IC ACCGTCGACC 

lT tacgcggagg 

IT GACGAGAAGT 
G CCCCGGGTCG 
G CTCCATGGCG 
G CTGCTTCAGG 
G GGTGCGATTA 
C AACCTGCTGC 
G AACGCGCTCG 
C GCCAGGCAGG , 
C CTGATCCCGA < 
3 CGACGGGACA i 
3 TCCCTTGCTC I 
3 ATGAAGCTGA i 
^ CTCAACGTCA ' 
\ CCTCATTCTT : 
r CCAGCGGACA I 
: GATGGCGAGC ( 
V AGCTCGCTCG C 
J GAGCCACCCG C 
: CTCGCGCTCG 1 
! CCCGAGGTGC 1 
I CTCGCTCCTG G 
I GGCATCTCCT C 
I AGCATTCTAC G 
I CCGCTGCGCA C 
: GACGTCTCGC C 
CGGTGCTCGC T 
CGCCGGATGA C 
CGGAGGCCGT T 
TCGCC-ATCGT G 
GCGTCTCCCA G. 
CCGCCCGGTA 0 
CCACGGAGGC Gl 
GGGAGTTTCG G< 
AGCAGATCCT Gl 
TCGCCCGTGA G; 
CTCTTCGCAA CC 
CAGAGAGCGC TC 
CCCTCATTGC GC 
CACTCGCCGA AC 
TTGCTCGAAT AC 
GCCGCCACGA TC 
CCCGGGGTTT GC 
CTCATTCCCG AC 
TCCCCTCCCT CA 
CGGCGACGGC CA 
GCATGCCACG AC 


CA CCACGGAGCG 
AG GGCGCGAGAT 
GA CGTCGTCAGG 
TG GTTGGGAGCA 
CC TGGAAGTTTG 
3A AGCCTGCTTT 
3A TCGAGCGCCT 
:C TCACCCGATA 
*G AAGAATGGGA 
3A AGTACGGATT 
PC CATCGTTTAC 
:C AGCTGCTCGA 

;g gaatcccgat 

IT TCCGTCGCTT 

:g atgaggagac 

G TCCTCCATGA 
G CCGAGGCCGA 
A TCGCTGCTGG 
c GGTCGCCCGA 
G ATGAGGTGCT 
G ACCTGGAGTA 
A GCGCCCTGAG , 
A CGAGCGCGAG < 
C CCCTCGAGGC I 
A AAGAAACTCC I 
A TCTTGAAGCC ( 
T TCATGATGCA J 
A AGCCCATTGT ( 
: CCGTCCGGGT / 
9 CTCGCGCTCA ( 
5 CCCTGATGCA C 
> TCCTCCTGAC C 
: TCGGCGAGCT C 
; GGTTCCATCG A 
’ GGATAGGCGC g 
: GCAAGGAGGC G 
>■ CGCCGGGCCC G 
I CGCGACGCTC T 
: TCCGCCCGCG G 
CGAGGCCGAC G 
TCGGTGGGCG G 
GCGGCTGGGC A 
GATCGCCGGC A 
CGCGAGCTTC Ci 
GCTCGCGGCC Gl 
GGGGGAGGCG Gl 
GACCGCGCTG C< 
GGAGTGAGCC T( 
CCATGACCGG AC 
TGGAGCGACA GC 
GGATCCCCTC CJ 
AGGTCAGCTC Gl 
ACTCCCGGTC T1 
TGCTATCGAC GC 
GCGCCTCCCC GC 
ACATCGACAT CP 
CACGACACGT CP 
CACGAGCAGA AC 
ACATCCTTGC GA 


CG CTTCTATATG 
AT CATGCACATC 
CG CCCCGCCTTC 

:a gcgtgggtgc 
:g cagcatagga 

PT CGACTTCAAG 

:t gagagaggca 

PA CCACGACGTG 

;a atcgagcgcg 

’T GTTCGGGCTG 
iC GTCACGCGCG 
IA TGCTCGCTCC 
>T GCGTGCGATC 
T CGGCTCGGCG 
,C CAAGACCCTG 
A GCGGCGCAGG 
A CGGCAGCAGG 
G CACCGATACC 
A GGCGCTCGAG 
T CCGCTTCGAC 
A CTGCGGGGCA 
3 AGATGGGACT 
3 CCTCCCGTAC ' 
C GGAGATCGCC i 
: CGTGTTTGGA ‘ 
- CTCCAAAGCT < 
\ ACTCGCGCGC < 
P CAGCGCGCGA 1 
P AACGTCGGAA ( 
^ GCGCGCCGCT 1 
L CGGCCTCACC C 
: CGCGCGCGCC 1 
’ CTTCGGCGGC C 
! AGTCCTCTTC C 
I GCTCGTCCTG C 
' GCCCCCCGGG C 
GCTGGTGCAG C 
TGCGCAAGCC T 
GACATCCGGC C 
GAGGCGCTCC G 
GAGGCGGCGC A 
AAGGCGCTCG A 
AGACTTCCCC A 
CGCGCGGCGA C 
GGCCCGTACC G< 
GCGCGCCTTC Ai 
CGCGCAGCCG At 
TCTCTCGGGC G< 
AGCCGCGCCC G( 
GGCGACGACC C( 
CAGACACTCG T r . 
GTCGCGCCAG TC 
TTGTCCGATG T1 
GGCGCTGCCG AC 
GCCTGACCGC T1 
CAGGCCCACG AC 
CACATCAGAG AI 
ACCGTCCCCG AC 
GACTAGCGTG CC 


’G CATCTCCTTC 
•C GTCGACTCGC 
’C GAGGCAAAAT 
IC TGGTGGCCGG 
A GATTTTATGA 
G CCGTTC3CGC 
A ACCCCCATCT 
G TCGGCGGTGT 
G GAGTACTCGT 
G CCGCCCGAGG 
G ATCGACCTGC 
C GGACAAGAGG 
C AGCGCTCTGT 
3 ACTGCGCGCG 
3 GTCGCGTCCG 
3 AACCCGCTCG 
1 CTGAGCACGA 

; acgatctacc 

; CTGGTGAAGG 
; AATATCCTCA 
i TCGATCAAGA 
' GTATTCTCCA 
I GGTAGAGGCC 
I GTGGGCACCA 
■ TACCACCCCG 
G3ATAACTCG 
GGGTGCTGTC 
AGATCGAATC 
GAAGTGCCGG 
TGCCATGTCC 
GAGCGGCAGG 
TTCGGCGAGC 
GTGGTGCTGG 
CAGGATCCGG 
CTGCTCATGG 
GCGCTCTCGG 
CGCATGCAGG 
TGAGCCTCGG 
CGCCCCCCGC 
GCCCGTTCCG 
AGCGGCCGCG 
ACAAGGCACC 
ACAAAACGAA 
CGGAGCACGC 
GCGGATCGAG 
ACCCCGCGGA 
AGCGGGCGCT 


GCAGCCGAGC 

GGTCCGCGCA 

CGCCCGAGCG 

TTCAGCGCCT 

TCGGATCGGA 

TTGTTGCACC 

ACTGGCACCG 

TTTTCGCCGC 

ACCAGTTTCC 

ATTCTCCGCT 

ACCAGAACAG 

CCTCCGCTCG 
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60001 TGCCGAGATC GGCTGTCCTG TGCGACGGCA ATGTCCTGCG ATCGGCCGGG CAGGATCGAC 
60061 CGACACGGGC GCCGGGCTGG AGGTGCCGCC ACGGGCTCGA AATGCGCTGT GGCAGGCGCC 
60121 TCCATGCCCG CTGCCGGGAA CGCAGCGCCC GGCCAGCCTC GGGGCCACGC TGCGAACGGG 
60181 AGATGCTCCC GGAGAGGCGC CGGGCACAGC CGAGCGCCGT CACCACCGTG CGCACTCGTG 
5 60241 AGCGCTAGCT CCTCGGCATA GAAGAGACCG TCACTCCCGG TCCGTGTAGG CGATCGTGCT 

60301 GATCAGCGCG TCCTCCGCCT GACGCGACTC GAGCCGGGTA TGCTGCACGA CGATGGGCAC 
60361 GiCCGATTCG ATCACGCTGG CATAGTCCGT ATCGCGCGGG ATCGGCTCGG GGTCGGTCAG 
60421 A7CGTTGAAC CGGACGTGCC GGGTGCGCCT CGCTGGAACG GTCACCCGGT ACGGCCCGGC 
60481 GGGGTCGCGG TCGCTGAAGT AGACGGTGAT GGCGACCTGC GCGTCCCGGT CCGACGCATT 
W 60541 CAACAGGCAG GCCGTCTCAT GGCTCGTCAT CTGCGGCTCA GGTCCGTTGC TCCGGCCTGG 
60601 GATGTAGCCC TCTGCGATTG CCCAGCGCCT CCGCCCGATC GGCTTGTCCA TGTGTCCTCC 
60661 CTCCTGGCTC CTCTTTGGCA GCCTCCCTCT GCTGTCCAGG TGCGACGGCC TCTTCGCTCG 
60721 ACGCGCTCGG GGCTCCATGG CTGAGAATCC TCGCCGAGCG CTCCTTGCCG ACCGGCGCGC 
60781 TGAGCGCCGA CGGGCCTTGA AAGCACGCGA CCGGACACGG GATGCCGGCG CGACGAGGCC 
15 60841 GCCCCGCGTC TGATCCCGAT CGTGGCATCA CGACGTCCGC CGACGCCTCG GCAGGCCGGC 

60901 GTGAGCGCTG CGCGGTCATC GTCGTCCTCG CGTCACCGCC ACCCGCCGAT TCACATCCCA 
60961 CCGCGGCACG ACGCTTGCTC AAACCGCGAC GACACGGCCG GGCGGCTGTG GTACCGGCCA 
61021 GCCCGGACGC GAGGCCCGAG AGGGACAGTG GGTCCGCCGT GAAGCAGAGA GGCGATCGAG 
61081 GTGG7GAGAT GAAACACGTT GACACGGGCC GACGAGTCGG CCGCCGGATA GGGCTCACGC 
/U 61141 TCGGTCTCCT CGCGAGCATG GCGCTCGCCG GCTGCG3CGG CCCGAGCGAG AACACCGTGC 
61201 AGGGCACGCG GCTCGCGCCC GGCGCCGATG CGCACGTCAC CGCCGACGTC GACGCC3ACG 
61261 CCGCGACCAC GCGGCTGGCG GTGGACGTCG TTCACCTCTC GCCGCCCGAG CGGATCGAGG 
61321 CCGGCAGCGA GCGGTTCGTC GTCTGGCAGC GTCCGAACTC CGAC-TCCCCG TGGCTACGGG 
61381 TCCGAGTGCT CGACTACAAC GCTGCCAGCC GAAGAGGCAA GCTGGCCGAG ACGACCGTGC 
Z5 61441 CGCATGCCAA CTTCGAGCTG CTCATCACCG TCGAGAAGCA GAGCAGCCCT CAGTCGCCAT 
61501 CGTCTGCCGC CGTCATCGGG CCGACCTCCG TCGGGTAACA TCGCGCTATC AGCAGCGCTG 
61561 AGCCCGCCAG CATGCCCCAG AGCCCTGCCT CGATCGCTTT CCCCATCATC CGTGCGCACT 
61621 CCTCCAGCGA CGGCCGCGTC AAAGCAACCG CCGTGCCGGC GCGGCTCTAC GTGCGCGACA 
61681 GGAGAGCSTC CTAGCGCGGC CTGCGCATCG CTGGAAGGAT CGGCGGAGCA TGGAGAAAGA 
3U 61741 ATCGAGGATC GCGATCTACG GCGCCGTCGC CGCCAACGTG GCGATCGCGG CGGTCAAGTT 
61801 CA7CGCCGCC GCCGTGACCG GCAGCTCTGC GATGC7CTCC GAGGGCGTGC ACTCCCTCGT 
61861 CGATACCoCA GACGGGCTCC TCCPCOTGCT CGGCAAGCAC CGGAGCGCCC GCCCGCCCGA 
61921 CGCCGAGCAT CCGTTCG3CC ACGGCAAGGA GCTCTATTTC TGGACGCTGA TCGTCGCCAT 
61981 CATGATCTTC GCCGCGGGCG GCGGCGTCTC GATCTACGAA GGGATCTTGC ACCTCTTGCA 
35 62041 CCCGCGCTCG ATCGAGGATC CGACGTGGAA CTACGTTGTC CTCGGCGCAG CGGCCGTCTT 

62101 CGAGoGGACG TCGCTCGCCA TCTCGATCCA CGAGTTCAAG AAGAAAGACG GACAGGGCTA 
62161 CGTCGCGGCG ATGCGGTCCA GCAAGGACCC GACGACGTTC ACGATCGTCC TGGAGGATTC 

62221 CGCGGCGCTC GCCGGGCTCG CCATCGCCT7 CCTCGGCGTC TGGCTTGGGC ACCGCCTGGG 

^2281 AAAC^-CCTAC CTCCACGGCG CGGCGTCGA7 CGGCATCGGC CTCGTGCTCG CCGCGGTCGC 

4U 62341 GGTCTTCCTC GCCAGCCAGA GCCGTGGACT CCTCGTAGGG GAGAGCGCGG ACAGGGAGCT 

62401 CCTCGCCGCG ATCCGCGCGC TCGCCAGCGC AGATCCTGGC GTGTCGGCGG TGGGGCGGCC 

62461 CCTGACGATG CACTTCGGTC CGCACGAAGT CCTGGTCGTG CTGCGCATCG AGTTCGACGC 

62521 CGCGCTCACG GCGTCCGGGG TCGCGGAGGC GATCGAGCGA ATCGAGACAC GGATACGGAG 

62581 CGAGCGACCC GACGTGAACC ACATCTACGT CGAGGCCAGG TCGCTCCACC AGCGCGCGAG 

45 62641 GGCGTGACGC GCCGTGGAGA GACCGCTCGC GGCCTCCGCC ATCCTCCGCG GCGCCCGGGC 

62701 TCGGGTAGCC CTCGCAGCAG GGCGCGCCTG GCGGGCAAAC CG7GAAGACG TCGTCCTTCG 

62761 ACGCGAGGTA CGCTGGTTGC AAGTTGTCAC GCCGTATCGC GAGGTCCGGC AGCGCCGGAG 

62821 CCCGGGCGGT CCGGGCGCAC GAAGGCCCGG CGAGCGCGGG CTTCGAGGGG GCGACGTCAT 

cn GAGGAAGGGC AGGGCGCATG GGGCGATGCT CGGCGGGCGA GAGGACGGCT GGCGTCGCGG 

5U 62941 CCTCCCCGGC GCCGGCGCGC TTCGCGCCGC GCTCCAGCGC GGTCGCTCGC GCGATCTCGC 

63001 CCGGCGCCGG CTCATCGCCG CCGTGTCCCT CACCGGCGGC GCCAGCATGG CGGTCGTCTC 

63061 GCTGTTCCAG CTCGGGATCA TCGAGCACCT GCCCGATCCT CCGCTTCCAG GGTTCGATTC 

63121 GGCCAAGGTG ACGAGCTCCG ATATCGCGTT CGGGCTCACG ATGCCCGACG CGCCGCTCGC 

44 GCTCACCAGC TTCGCGTCCA ACCTGGCGCT GGCTGGCTGG GGAGGCGCCG AGCGCGCGAG 

53 63241 GAACACCCCC TGGATCCCCG TCGCCGTGGC GGCCAAGGCG GCCGTCGAGG CGGCCGTGTC 

63301 CGGATGGCTC CTCGTCCAGA TGCGACGGCG GGAGAGGGCC TGGTGCGCGT ACTGCCTGGT 

63361 CGCCATGGCG GCCAACATGG CCGTGTTCGC GCTCTCGCTC CCGGAAGGGT GGGCGGCGCT 

63421 GAGGAAGGCG CGAGCGCGCT CGTGACAGGG CCGTGCGGGC GCCGCGGCCA TCGGAGGCCG 

63481 CCGTGCACCC GCTCCGTCAC GCCCCGGCCC GCGCCGCGGT GAGCTGCCGC GGACAGGGCG 
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63541 CGTACCGTGG ACCCCGCACG CGCCGCGTCG ACGGACATCC CCGGCGSCTC GCGCGGCGCG 
G3G01 GCCGGCGCAA CTCCGGCCCG CCGCCGGGCA TCGACATC7C CCGCGAGCAA GGGCACTCCG 
63661 CTCCTGCCCG CGTCCGCGAA CGATGGCTGC GCTGTTTCCA CCCTCGAGCA ACTCCGTTTA 
63721 CCGCGTGGCG CTCGTCGGGC TCArCGCCTC GGCGGGCGGC GCCATCCTCG CGCTCATGAT 
5 63781 CTACGTCCGC ACGCCGTGGA AGCGATACCA GTTCGAGCCC GTCGATCAGC CGGTGCAGTT 

63841 CGATCACCGC CATCACGTCC ACCACGATGG CATCGATTGC GTCTACTGCC ACACCACGGT 
63901 GACCCGCTCG CCGACGGCGG GGATGCCGCC GACGGCCACG TGCATGGGGT GCCACAGCCA 
63961 GA1 CTGGAAT CAGAGCGTCA TGCTCGAGCC CGTGCGGCGG AGCTGGTTCT CCGGCATGCC 
64021 GATCCCGTGG AACCGGGTGA ACTCCGTGCC CGACTTCGTT TATTTCAACC ACGCGATTCA 
1U 64081 CGTGAACAAG GGCGTGGGC7 GCGTGAGCTG CCACGGGCGC GTGGACGAGA TGGCGGCCGT 
64141 CTACAAGGTG GCGCCGATGA CGATGCGCTG GTGCC7GGAG TGCCATCGCC TGCCGGAGCC 
64201 GCACCTGCGC CCGCTCTCCG CGATCACCGA CATGCGCTGG GACCCGGGGG AACGGAGGGA 
64261 CGAGCTCGGG GCGAAGCTCG CGAAGGAGTA CGGGG7CCGC CGGCTCACGC ACTGCACAGC 
64321 GTGCCATCGA TGAACGATGA ACAGGGGATC TCCGTGAAAG ACGCAGATGA GATGAAGGAA 

ij 64381 TGGTGGCTAG AAGCGCTCGG GCCGGCGGGA GAGCGCGCGT CCTACAGGCT GCTGGCGCCG 

64441 CTCATCGAGA CCCCCCAGCT CCGCGCGCTC GCCGCGGGCG AACCGCCCCG GGGCGTGGAC 

64501 GAGCCGGCGG GCGTCAGCCG CCGCGCGCTG CTCAAGCTGC TCGGCGCGAG CATGGCGCTC 

64561 GCCGGCGTCG CGGGCTGCAC CCCGCATGAG CCCGAGAAGA TCCTGCCGTA CAACGAGACC 

64621 CCGCCCGGCG TCGTGCCGGG TCTCTCCCAG TCCTACGCGA CGA3CATGGT GCTCGACGGG 

20 64691 TATGCCATGG GCCTCCTCGC CAAGAGCTAC GCGGGGCGGC CCATCAAGAT CGACCCCAAC 

64741 CCCGCGCACC CGGCGAGCCT CGGCGCGACC GGCGTCCACG AGCAGGCCTC GATCCTCTCG 

64801 CTGTACGACC CGTACCGCGC GCGCGCGCCG ACGCGCGGCG GCCAGGTCGC GTCGTGGGAG 

64861 GCG CTCTCCG CGCGCTTCGG CGGCGACCGC GAGGACCCCG GCGCTGGCC7 CCGCTTCGTC 

64921 CTCCAGCCCA CGAGCTCGCC CCTCATCGCC GCGCTGATCG AGCGCGTCCG GCGCAGGTTC 

20 64981 CCCGGCGCGC GGTTCACCTT CTGGTCGCCG GTCCACGCCG AGCAAGCGC? CGAAGGCGCG 

65041 CGGGCGGCGC TCGGCCTCAG GCTCTTGCCT CAGCTCGACT TCGACCAGGC CGAGGTGATC 

65101 CTCGCCCTGG ACGC5GACTT CCTCGCGGAC ATGCCGTTCA GCGTGCGCTA TGCGCGCGAC 

65161 TTCGCCGCGC GCCGCCGACC CGCGAGCCCG GCGGCGGCCA TGAACCGCCT CTACGTCGCG 

65221 GAGGCGATGT TCACGCCCAC GGGGACGCTC GCCGACCACC GGCTCCGCGT GCGGCCCGCC 

jU 65281 GAGGTCGCGC GCGTCGCGGC CGGCGTCGCG GCGGAGCTCG TGCACGGCCT CGGCCTGCGC 

65341 CCGCGCGGGA TCACGGACGC CGAC3CCGCC GCGCTGCGCG CGCTCCGCCC CCCGGACGGC 

65401 GAGGGGCACG GCGCCTTCGT CCGG3CGCTC GCGCGCGATC TCGCGCGCGC GGGGGGCGCC 

65461 GGCGTCGCCG TCGTCGGCGA CGGCCAGCCG CCCATCGTCC ACGCCCTCCG GCACGTCATC 

65521 AACGCCGCGC TCCGCAGCCG GGCGGCCTGG ATGGTCGATC CTGTGCTGAT CGACGCGGGC 

30 65561 CCCTCCACGC AGGGCTTCTC CGAGCTCGTC GGCGAGCTCG GGCGCGGCGC GGTCGACACC 

65641 1GATCCTCCT CGACGTGAAC CCCGTGTACG CCCCGCCGGC CGACGTCGAT TTCGCGGGCC 

65701 TCCTCGCGCG CSTGCCCACG AGCTTGAAGG CCGG3CTCTA CGACGACGAG ACCGCCCGCG 

65761 CTTGCACGTG GTTCGTGCCG ACCCGGCATT ACCTCGAGTC GTGGGGGGAC GCGCGGGCGT 

m till 1 ACGACGGCAC GGTCTCGTrC GTGCAACCCC TCGTCCGGCC GCTGTTCGAC GGCCGGGCGG 

40 65831 TGCCCGAGCT GCTCGCCGTC TTCGCGGGGG ACGAGCGCCC GGAl'CCCCGG CTGCTGCTGC 

65941 GCGAGCACTG GCGCGGCGCG CGCGGAGAGG CGGATT7CGA GGCCTTCTGG GGCGAGGCAT 

66001 TGAAGCGCGG CTTCCTCCCT GACAGCGCCC GGCCGAGGCA GACACCGGAT CTCGCGCCGG 

66061 CCGACCTCGC CAAGGAGCTC GCGCGGCTCG CCGCCGCGCC GCGGCCGGCC GGCCCCGCGC 

66,21 TCGACGTGCC GTTCCTCAGG TCGCCGTCGG TCCACGACGG CAGGTTCGCC AACAACCCCT 

40 66181 GGCTGCAAGA GCTCCCGCGG CCGATCACCA GGCTCACCTG GGGCAACGCC GCCATGATSA 

66241 GCGCGGCGAC CGCGGCGCGG CTCGGCGTCG AGCCCGGCGA TGTCGTCGAG CTCGCGCTGC 

66301 GCGGCCGTAC GATCGAGATC CCGGCCGTCG TCGTCCGCGG GCACSCCGAC GACGTGATCA 

66361 GCGTCGACCT CGGCTACGGG CGCGACGCCG GCGAGGAGGT CGCGCGCGGG GTGGGCGTGT 

fj^^l GGGGGrA TCG GATCCGCCCG TCCGACGCGC GGTGGTTCGC GGGGGGCCTC TCCGTGAGGA 

DO 66481 AGACCGGCGC CACGGCCGCG CTCGCGCTGG CTCAGATCGA GCTGTCCCAG CACGACCGTC 

66541 .CCATCGCGCT CCGGAGGACG CTGCCGCAGT ACCGTGAACA GCCCGGTTTC GCGGAGGAGC 
66601 ACAAGGGGCC GGTCCGC7CG ATCCTGCCGG AGGTCGAGTA CACCGGCGCG CAATGGGCGA 
66661 TGTCCATCGA CATG1CGATC TGCACCGGGT GCTCCTCGTG CCTCGTGGCC TGTCAGGCCG 
66721 AGAACAACGT CCTCGTCGTC GGCAAGGAGG AGG7GATGCA CGGCCGCGAG ATGCAGTGGT 
00 66781 TGCGGATCGA TCAGTACTTC GAGGGTGGAG GCGACGAGGT GAGCGTCGTC AACCAGCCGA 

66841 TGCTCTGCCA GCACTGCGAG AAGGCGCCGT GCGAGTACGT CTGTCCGGTG AACGCGACGG 
66901 TCCACAGCCC CGATGGCCTC AACGAGATGA TCTACAACCG ATGCATCGGG ACGCGCTTTT 
66961 GCTCCAACAA CTGTCCGTAC AAGATCCGGC GGTTCAATTT CTTCGACTAC AATGCCCACG 
67021 TCCCGTACAA CGCCGGCCTC CGCAGGCTCC AGCGCAACCC GGACGTCACC GTCCGCGCCC 
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67081 
67141 
67201 
67261 
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67381 
67441 
67501 
67561 
67621 
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69721 
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50 


69841 

69901 

69961 

70021 

70081 

70141 

70201 

70261 

70321 

70301 

70441 

70501 

70561 


GCGGCGTCAT 
CGCAGATCGA 
GTCCGACCGG 
GGCGCAGGGA 
AGTACCTCGC 
GACCCGGAGC 
GAGCCCCGAG 
CACCGACCGA 
GGCTCGGCTG 
CGATCACCTA 
GGGCCTTCGC 
TCTCCGCCAT 
AGGCGATGAC 
GCCCCTGGTT 
TCCGGAGCGC 
TGTTCTGGTA 
GCGTCCGGCG 
TCCGGCATTA 
CGSTGCACTC 
CGCTCTTTCC 
CGCTGCTCAT 
TCGACGATCT 
TCGAGAACTT 
CGCGCCTGCA 
TCATCCAGCT 
CCCTCCrGGT 
AGCAAGAGTT 
TCTTCATCGG 
CGTTCATCCC 
GCGAGGGGGG 
GCTCCATGCG 
CTATCCGGTG 
GCTGCTGCCC 
CGCTTTCCAC 
CCCGATCACG 
TTACCTGACG 
CGTCACGCTG 
CCAGGCGGAG 
CGAGGAGCCA 
GCCCTCGTCC 
ATGATCCGCC 
GCGATGCAGC 
TATCTCCAGG 
GAGCTCGTGC 
CTCGGCGACG 
ATCGGACCCG 
TACGGCCTGA 
CrCGCCTACG 
GCGCTCCGCG 
GCGGCGCGAC 
TCGGCGGCTT 
CGGTGTTTCT 
CCTGGCCCAC 
CGGCGCTCTC 
AGCGGATCGC 
CAGGCTTCTT 
TGCTCCGCCG 
TGTATGGCCT 
TCGACTGGCT 


GGAGAAATGC 
GCGGCGGCCG 
CGCGATCCAG 
GCCGCGCGCG 
CAAGATCGAG 
CCCG AGCGTC 
CGTCAAACCG 
CGATCAGCTG 
GATGCTCGCG 
CACCGTCCTC 
GATCACCAAC 
CCTCCTCCTG 
GCTCTTCGCG 
CGCCTACTGG 
GCTGC3G7GG 
CATGGGCCTC 
GGTGATCTAC 
CCGGGTGCTG 
GATCGTGAGC 
GCCGTTCTTC 
CCCGGTGCGG 
CGCGAAGATG 
CCTCGCCTGG 
CGGCCCGAAC 
CCTCTGGAGC 
CAACGTCGGG 
CCTCCCGTCC 
GTCAGGCGGC 
CGTCGCGGAG 
CCGCTGATGG 
ATCCGAGAGC 
AAGGGGCTCG 
TTCGCGATCC 
TATCCGCTGA 
TTCGAGATGG 
AGGCTGCCGA 
GATCGGTT7C 
CGCGACCTCC 
TGAGGGCCGG 
TGCTCGCCGG 
AGGAGAAATA 
ACCCGCCCGA 
GCGTCCTCGA 
AGCGCGGCCG 
GCAGCTCGCG 
AGGCGCGGAG 
TGCCGCGCTA 
TGAAGGCGCT 
GCCGGGCAGA 
GATCGCGGCC 
CGTCGATCTC 
GTCCGTGGGC 
GGCGGTGCGC 
CGCGCCGATC 
CGGCGAGCAC 
CGTCGTGCGC 
GCGATCGTTC 
GAGCGGCGCC 
CATGTCCCTC 


ACGTACTGCG 
CTCCGGCCGG 
TTCGGG7CGC 
TACGCCGTGC 
AACCCGAACC 
AAACCCGCGC 
GAGATTGAAT 
TCGAAGCAGC 
TTCGGGCTCG 
ACCGGGATCG 
TTCGTCTGGT 
CTCGAGCAGA 
GTCGTCCAGG 
ATCTTCCCGT 
GACGCCGCCG 
GTCCCGGATC 
GGGCTCATGT 
TACGGGCTGC 
AGCGATTTCG 
GTCGCGGGCG 
CGGA7CTACG 
ACGCTCGTGA 
TACAGCGGCT 
AGCGCCGCCT 
GAGCGGATCC 
ATGTGGAGCG 
AAGTGGCACG 
TTCTTCATGC 
GTCAAGGAGC 
AGACCGGAAT 
TCAGGCGGCG 
ACGAGGCGCT 
TGGGGGTCG7 
ACGTGGGCGG 
GGGTGCTCTC 
GGCTCTACCT 
TGGTCGGGCT 
TCGCGCTCGG 
CGCCCCGGCT 
GTGCCGCGAG 
CGGACTCTGG 
GGGGACCGTC 
CGGGGCGTAC 
GCAGCGCTTC 
CGTGGCGACG 
CTTCCCGCCG 
CTCGGACGAT 
TCAGCTGAGC 
GCAGGAGCTG 
TCGCTCGCGA 
CGCCGGTTCT 
CCGCTCG7CA 
CGCCTCCTCG 
CTGGTCGGCC 
GCGCGGCGCA 
TCGGCGATCT 
GCGCAGGACC 
ATGC7GCCGG 
GACGCGACCT 


TGCAGCGGAT 
GCGAGGTGGT 
TGGATCACGC 
TCCACGACCT 
CGGGGCTCGG 
TCGGGGCGGA 
GAGCCATGGC 
TCCTCGAGCC 
CGCTCGGCGG 
GCGTGTGGGG 
GGATCGGGAT 
A3TGGCGGAC 
CCGGCCTCTT 
ACCCCGCGAC 
CGATCGCGAC 
7GGCGCCGCT 
CGTTCGGCTG 
TCGCGGGGCT 
CGATCGCCCT 
CGATCTTCTC 
GGCTCCATAA 
CCGGCTGGAT 
CGGCGTACGA 
ACTGGGCCCA 
GGACGA3CCC 
AGCGGTrCAC 
GCTACAGCCC 
TCCTGTTCCT 
TCAACCATGA 
GCTCGGCGAG 
CGGCTACCGC 
CGGCCTCCCG 
GGGCGGCTAC 
GCGCCCGCTG 
CACCTCGATC 
CCCGCTCTTC 
CGACGACACG 
CGCCCGGCGC 
CGCCCTCTCG 
AAGGTGCTGC 
GAGCCGTGCG 
GCGCGCGGGC 
GTCACGGAGG 
GAGACCTTCT 
AACATGACGC 
GGCAGGATCT 
CTGCCCGACA 
CGCGGAG'i’GG 
CCATGAACAG 
TCGCGGCGCT 
TCTTCTCGTA 
CGCTCCTCAC 
AGACGATGGT 
TGGACACGCT 
TCCTCGAGCA 
ACTTCGCGAT 
GTGAGCCGAG 
TCGTGGCGAT 
GGTACTCGAC 


CCGAGAGGCG 
CACCGCCTGC 

ggatacaaag 
CGGCACCCGG 
GGCGGAGGGC 
CGGCGCCGAG 
GGGCCCGCTC 
GGTATGGAAG 
CACGGGCCTG 
CAACAACATC 
CCGCCACGCC 
GAGCATCAAC 
TCCGGTCCTC 
GATGCAGGTG 
dACTTCACG 
GCGCGACCAC 
GCACGGCGCG 
CGCGACGCCC 
GGXGCCCGGC 
CGGGTTCGCG 
CGTCGTGACC 
CGTCATCCTC 
GATGCATCAG 
GCACGTCTGC 
CGTCGCGCTC 
GCTCATCGTG 
GACGTGGGTG 
GAGCTTTT7G 
AGAGCTGGAG 
TTCGATGACC 
CGGGTGGAAG 
CGCTCGAACC 
TTCGTCCAGT 
AACTCGGCGC 
TTCGGCGTGC 
GACGCCCCGG 
GAACCTTCCT 
GTCGTCGTCG 
GGCGCGCGCT 
CCGAGCCGGA 
AGCACTTCGA 
GCGTCACCGG 
TGCCGCTCTT 
GCGCGCCGTG 
TGCGCCCGCC 
ACCA3GTCAT 
TCGAAGAGCG 
CCGCGGGCGC 
GGATGCCATC 
CGGCGCGGTC 
CCTCGCCGCG 
CTGCAACGCC 
GGCGCCGCTG 
GTATCCGTGG 
CAGGGCGCCC 
CTGGATCGCC 
GGCCGACGTC 
CACGATCGTC 
GATGTTCCCG 


GACATCCGCG 
CAGCAGGCCT 
ATGGTCGCGT 
CCGCGGACGG 
GCCGAGAGGC 
AGGCGACCCG 
ATCCTGGACG 
CCGCGCTCCC 
CTCTTCCTCG 
CCGGTCGCCT 
GGGACGTTCA 
CGCTTCGCCG 
CACCTCGGCC 
TGGCCGCAGT 
GTGTCGCTCC 
GCCCCGGGCC 
GCCGACCACT 
CTCGTCGTCT 
TGGCACTCGA 
ATGGTCCTCA 
GCGCGCCACC 
TL'GTACATCA 
TTTTTCCAGA 
AACGTGCTCG 
TGGCTCATCT 
ATGTCGCTCG 
GACTGGAGCC 
CGCGTCTTTC 
AAGGCTCGGG 
CGGAGGCGAT 
CGTTCACGCC 
TCAACCGGAT 
GGTTCTGCAA 
CGGCGTTCAT 
TCATCGGCTT 
GCTTCGAGCG 
TCTCGAGCGC 
CGAGGAGGCG 
CGCGCCGTTC 
CTTCGAGCGG 
CGACGGCCGC 
GCCGCCCGGC 
GCTCACCGTC 
CCACSGGATC 
CCCGTCGCTC 
CATCGAGGGC 
CTGGGCCGTG 
CCTCCCGCCA 
GAGTACAAGG 
GCCGCGATCG 
TGGTCGTTCG 
ATGCGCGCGG 
CCTCTGCTCG 
ATGCACCCCG 
TACTTCAATC 
GTCGCCC7CG 
AAGGACGCGA 
TTCTCGTCGT 
GTCTACGTGT 
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70621 

70681 

70741 

70601 

70861 

70521 

70S81 

71041 

71101 

71161 

71221 

71281 

71341 

71401 

71461 

71521 

71581 

71641 

71701 

71761 

71821 

71881 

71941 


TCGCGAGCGC 

CGTCCGGCTA 

TCGCGTTCAC 

CGAACAAGCC 

CCTCCGTGCT 

CGATCAAGCG 

ACATCGACTT 

GGCTCGACCT 

GGCTGCGAGG 

ACCGGAGCAT 

CCCCTGGGGG 

GCTCTGGGCC 

CCCCGAGAAG 

CGAGGCGCGC 

CCTCGTCGAT 

GGTGGCGGGG 

CTCGTGTCGC 

TCCGCGGCCG 

GCCTTCCTGG 

CGCCAGCTCG 

GACGGCAAGC 

GTGCTGCGCG 

CACGCGCTCA 


CTTCGTGACC 

CCTCGCGAGG 

GATATTCTGG 

CGATGAGGTC 

CGTCGTCCTC 

GCGCCCGCGC 

TCACTGGCTC 

CGCGACCCTG 

GCGGCCGGTG 

ATGATGTTCC 

CGCGTGATCC 

TCGCTCGCGA 

GATCTCGGGC 

CTGGGCCAGC 

CG GGAG AGG G 

GGCGCGCGA7 

GCCCGGGCGC 

ACGCCGCGCC 

AGCCCACGCG 

CCTTCACCGA 

CCCTCCTCCT 

GCCCCGTCGA 

CGGTCAGCTT 


GCCGTCGGCG 

CTGAACGACT 

GCCTATGCGS 

GCCTTCTTCC 

ACGCGGTTCG 

CAGCTCTCGT 

GTGGTGCCGG 

TGCGTCGTGG 

GTCCCG3TCC 

GTnCCGTCA 

TCGCGTTCGC 

TGCGGGCCCG 

CGC3GCGCGA 

AGCTCGTCGA 

GCATCGTGAG 

GAGCCGGGCC 

CGCGTCCGAG 

GGCGAGCGAC 

CGGGGTGGAC 

CATGGACGGG 

CGTCCTCGCG 

GGGGCTGAAG 

CC-ACCCGCGC 


CGCTCACGGT 

CGCACTAT7A 

CCTATTTCCA 

TCGACCGCTG 

TCGTCCCGTT 

GGATGGCGCT 

CGACAGGGCG 

GCGGCCTC7C 

ACGACCCGCG 

CAGCGAGGTT 

CGTCCTGCTC 

CGAGGCGGAT 

GGTCGGCATG 

CGCGCAGCGC 

CATCCCGATC 

GTCGCCGTGG 

CCCGAGCGCG 

GGCTCCGGCG 

ATCGAGGAGC 

CGGCGGGTGC 

TACTACCGGT 

CTCCTCCCGT 

GAGCGCCCGC 


CCTCTCGTAT 

CGCGCTCGGG 

CTTCATGTT3 

GGAAGGGCCC 

CCTGATCCTG 

CTGGGTCGTC 

CCACGGGTTC 

GACCGCGTTC 

GCTCGAAGAG 

CGCCAGGAGC 

GCGATCGGCG 

CTGCGGCCCT 

GTCCAGCAGT 

C-CGGAGCTCC 

GACGACGCGA 

CCCTCCTGC? 

CGCGCCCCGC 

CGGAGGAGCC 

GCCTCGGCCG 

GCCTCCGCGA 

GTCCCGCGCT 

ACCGGCTCGG 

CGGCCGCDD 


GCCGCGCAGA 

CGGCTGCTCC 

ATCTGGATCG 

TGGCGGCCGA 

ATGTCGTACG 

GTCTCCGGCT 

gcctatcact 

GCCGCGTGGC 

GCCTTTGCGT 

AGGACACGCT 

GCGCGCTGAC 

CCCrCGCGTT 

CGCTGTTCGA 

GCCGCTTCGG 

TCGAGCTCAT 

GGCAGCCGGC 

GCTGGCCCCG 

GCCCGAAGGC 

CCCGGTGGAC 

CTACTTCGCC 

GTGCGGCCTC 

CGAGCAGTTC 


Example 2 

Construction of a Myxococcus xanthus Expression Vector 
The DNA providing the integration and attachment function of phage Mx8 was 
inserted into commercially available pACYC184 (New England Biolabs). An -2360 bp 
Mfel-Smal from plasmid pPLH343, described in Salmi etal., Feb. 1998, J. Bact. 180(3): 
30 614-621, was isolated and ligated to the large EeoRI-Xmnl restriction fragment of plasmid 

PACYC184. The circular DNA thus formed was -6 kb in size and called plasmid 
pKOS35-77. 

Plasmid pKOS35-77 serves as a convenient plasmid forexpressing recombinant 

PKS genes of the invention under the control of the epothilone PKS gene promoter. In one 

35 illustrative embodiment, the entire epothilone PKS gene with its homologous promoter is 

inserted in one or more fragments into the plasmid to yield an expression vector of the 
invention. 

The present invention also provides expression vectors in which the recombinant 
PKS genes of the invention are under the control of a Myxococcus xanthus promoter. To 
40 construct an illustrative vector, the promoter of the pilA gene of M. xanthus was isolated 

as a PCR amplification product. Plasmid pSWU357, which comprises the pilA gene 
promoter and is described in Wu and Kaiser, Dec. 1997, J. Bact. 179(24):7748-7758, was 
mixed with PCR primers Seq 1 and Mxpil 1 primers: 
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Seql : 5'-AGCGGATAACAATTTCACACAGGAAACAGC-3’; and 
Mxpill : 5'-TTAATTAAGAGAAGGTTGCAACGGGGGGC-3’, 


and amplified using standard PCR conditions to yield an -800 bp fragment. This fragment 
was cleaved with restriction enzyme Kpnl and ligated to the large Kpnl-EcoRV restriction 
fragment of commercially available plasmid pLitmus 28 (New England Biolabs). The 
resulting circular DNA was designated plasmid pKOS35-71B. 

The promoter of the pilA gene from plasmid pKOS35-71B was isolated as an -800 
bp EcoRV-SnaBI restriction fragment and ligated with the large Mscl restriction fragment 
of plasmid pKOS35-77 to yield a circular DNA -6.8 kb in size. Because the -800 bp 
fragment could be inserted in either one of two orientations, the ligation produced two 
plasmids of the same size, which were designated as plasmids pKOS35-82. 1 and pKOS35- 
82.2. Restriction site and function maps of these plasmids are presented in Figure 3. 

Plasmids pKOS35-82. 1 and P KOS35-82.2 serve as convenient starting materials 
for the vectors of the invention in which a recombinant PKS gene is placed under the 
control of the Myxococcus xanthus pilA gene promoter. These plasmids comprise a single 
Pad restriction enzyme recognition sequence placed immediately downstream of the 
transcription start site of the promoter. In one illustrative embodiment, the entire 
epothilone PKS gene without its homologous promoter is inserted in one or more 
fragments into the plasmids at the PacI site to yield expression vectors of the invention. 


The sequence of the pilA promoter in these plasmids is shown below 

»=iGci««TcccAirac(iG5csc»orGccccccopSSSTO“5?SSM Cr “ 5TT ' =6: ' ,7CCC5 


To make the recombinant Myxococcus xanthus host cells of the invention, 

M. xanthus cells are grown in CYE media (Campos and Zusman, 1975, Regulation of 
development in Myxococcus xanthus: effect of 3’: 5’-cydic AMP, ADP, and nutrition, 
Proc. Natl. Acad. Sci. USA 72: 518-522) to a Klett of 100 at 30°C at 300 ipm. The 
remainder of the protocol is conducted at 25°C unless otherwise indicated. The cells are 
then pelleted by centrifugation (8000 rpm for 1 0 min. in an SS34 or SA600 rotor) and 
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resuspended in deionized water. The cells are again pelleted and resuspended in 1/1 00th of 
the original volume. 

DNA (one to two pL) is electroporated into the cells in a 0.1 cm cuvette at room 

temperature at 400 ohm, 25 pFD, 0.65 V with a time constant in the range of 8.8 - 9.4. The 

DNA should be free of salts and so should be resuspended in distilled and deionized water 

or dialyzed on a 0.025 pm Type VS membrane (Millipore). For low efficiency 

electroporations, spot dialyze the DNA, and allow outgrowth in CYE. Immediately after 

electroporation, add 1 mL of CYE, and pool the cells in the cuvette with an additional 1 .5 

mL of CYE previously added to a 50 mL Erlenmeycr flask (total volume 2.5 ml). Allow 

the cells to grow for four to eight hours (or overnight) at 30 to 32°C at 300 rpm to allow 

for expression of the selectable marker. Then, plate the cells in CYE soft agar on plates 

with selection. Ifkanamycin is the selectable marker, then typical yields are 10 3 to 10 5 per 

pg of DNA. If streptomycin is the selectable marker, then it must be included in the top 
agar, because it binds agar. 

With this procedure, the recombinant DNA expression vectors of the invention are 

electroporated into Myxococcus host cells that express recombinant PKSs of the invention 

and produce the epothilone, epothilone derivatives, and other novel polyketides encoded 
thereby. 


Example 3 

Construction of a Bacterial Artifici a l Chromosome fBACl for Expression of Epothilone in 

Myxococcus xanthus 

To express the epothilone PKS and modification enzyme genes in a heterologous 
host to produce epothilones by fermentation, Myxococcus xanthus, which is closely related 
to Sorangium cellulosum and for which a number of cloning vectors are available, can also 
be employed in accordance with the methods of the invention. Because both M. xanthus 
and S. cellulosum are myxobacteria, it is expected that they share common elements of 
gene expression, translational control, and post translational modification (if any), thereby 
enhancing the likelihood that the epo genes from 5. cellulosum can be expressed to 
produce epothilone in M. xanthus. Secondly, M. xanthus has been developed for gene 
cloning and expression. DNA can be introduced by electroporation, and a number of 
vectors and genetic markers are available for the introduction of foreign DNA, including 
those that permit its stable insertion into the chromosome. Finally, M. xanthus can be 
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grown with relative ease in complex media in fermentors and can be subjected to 
manipulations to increase gene expression, if required. 

To introduce the epothilone gene cluster into Myxococcus xanthus, one can build 
the epothilone cluster into the chromosome by using cosmids of the invention and 
homologous recombination to assemble the complete gene cluster. Alternatively, the 
complete epothilone gene cluster can be cloned on a bacterial artificial chromosome 
(BAC) and then moved into M. xanthus for integration into the chromosome. 

To assemble the gene cluster from cosmids pKOS35-70.1 A2, and pKOS35-79.85 
small regions of homology from these cosmids have to be introduced into Myxococcus 
xanthus to provide recombination sites for larger pieces of the gene cluster. As shown in 
Figure 4, plasmids pKOS35-154 and pKOS90-22 are created to introduce these 
recombination sites. The strategy for assembling the epothilone gene cluster in the 
M. xanthus chromosome is shown in Figure 5. Initially, a neutral site in the bacterial 
chromosome is chosen that does not disrupt any genes or transcriptional units. One such 
region is downstream of the devS gene, which has been shown not to affect the growth or 
development of M. xanthus. The first plasmid, pKOS35-154, is linearized with Dral and 
el ectroporated into M. xanthus. This plasmid contains two regions of the dev locus 
flanking two fragments of the epothilone gene cluster. Inserted in between the epo gene 
regions are the kanamycin resistance marker and the galK gene. Kanamycin resistance 
arises in colonies if the DNA recombines into the dev region by a double recombination 
using the dev sequence as regions of homology. This strain, K35-159, contains small 
regions of the epothilone gene cluster that will allow for recombination of pKOS35-79.85. 
Because the resistance markers on pKOS35-79.85 are the same as that for K35-1 59, a 
tetracycline transposon was transposed into the cosmid, and cosmids that contain the 
transposon inserted into the kanamycin marker were selected. This cosmid, pKOS90-23, 
was electroporated into K35-159, and oxytetracyclinc resistant colonies were selected to 
create strain K35-1 74. To remove the unwanted regions from the cosmid and leave only 
the epothilone genes, cells were plated on CYE plates containing 1% galactose. The 
presence of the galK gene makes the cells sensitive to 1 % galactose. Galactose resistant 
colonies of K3 5-1 74 represent cells that have lost the galK marker by recombination or by 
a mutation in the galK gene. If the recombination event occurs, then the galactose resistant 
strain is sensitive to kanamycin and oxytetracycline. Strains sensitive to both antibiotics 
are verified by Southern blot analysis. The correct strain is identified and designated K35- 
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175 and contains the epothilone gene cluster from module 7 through two open reading 
frames past the epoL gene. 

To introduce modules l through module 7, the above process is repeated once 
more. The plasmid pKOS90-22 is linearized with Dral and electroporated into K35-175 to 
create K35-180. This strain is electroporated with the tetracycline resistant version of 
pK.OS35-70.1A2, pKOS90-38, and colonies resistant to oxytetracycline are selected. This 
creates strain K35-185. Recombinants that now have the whole epothilone gene cluster are 
selected by resistance to 1% galactose. This results in strain K35-188. This strain contains 
all the epothilone genes as well as all potential promoters. This strain is fermented and 
tested for the production of epothilones A and B. 

To clone the whole gene duster as one fragment, a bacterial artificial chromosome 
(BAC) library is constructed. First, SMP44 cells are embedded in agarose and lysed 
according to the BIO-RAD genomic DNA plug kit. DNA plugs are partially digested with 
restriction enzyme, such as Sau3AI or Hindlll, and electrophoresed on a FIGE or CHEF 
gel. DNA fragments are isolated by clectroeluting the DNA from the agarose or using 
gelasc to degrade the agarose. The method of choice to isolate the fragments is 
electroelution, as described in Strong era/., 1997, Nucleic Acids Res. 19: 3959-3961, 
incorporated herein by reference. The DNA is ligated into the BAC (pBeloBACII) cleaved 
with the appropriate enzyme. A map of pBeloBACII is shown below. 
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The DNA is electroporated into DH10B cells by the method of Shcng et ah , 1995, 
Nucleic Acids Res. 23; 1990-1996, incorporated herein by reference, to create an 
S. cdlulosum genomic library. Colonies are screened using a probe from the NRPS region. 

5 of the epothilone cluster. Positive clones are picked and DNA is isolated for restriction 

analysis to confirm the presence of the complete gene cluster. This positive clone is 
designated pKOS35-178. 

To create a strain that can be used to introduce pKOS35-l 78, a plasmid, pKOS35- 
164, is constructed that contains regions of homology that are upstream and downstream 
10 of the epothilone gene cluster flanked by the dev locus and containing the kanamycin 
resistance galK cassette, analogous to plasmids pK.OS90-22 and pKOS35-154. This 
plasmid is linearized with Dral and electroporated into M. xanthus , in accordance with the 
method of Kafeshi et aL t 1995, Mol. Microbiol. 15: 483-494, to create K35-183. The 
plasmid pKOS35-178 can be introduced into K35-183 by electroporation or by 
15 transduction with bacteriophage PI and chloramphenicol resistant colonies are selected. 
Alternatively, a version of pKOS35-l 78 that contains the origin of conjugative transfer 
from pRP4 can be constructed for transfer of DNA from £ coli to K35-183. This plasmid 
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is made by first constructing a transposon containing the oriT region from RP4 and the 
tetracycline resistance maker from pACYCl 84 and then transposing the transposon in 
vitro or in vivo onto pKOS35-178. This plasmid is transformed into S17-1 and conjugated 
into M. xanthus. This strain, K35-190, is grown in the presence of 1% galactose to select 
5 for the second recombination event. This strain contains all the epothilone genes as well as 

all potential promoters. This strain will be fermented and tested for the production of 
epothilones A and B. 

Besides integrating pKOS35-178 into the dev locus, it can also be integrated into a 
phage attachment site using integration functions from myxophages Mx8 or Mx9. A 
10 transposon is constructed that contains the integration genes and att site from either Mx8 
or Mx9 along with the tetracycline gene from pACYC184. Alternative versions of this 
transposon may have only the attachment site. In this version, the integration genes are 
then supplied in trans by coelectroporation of a plasmid containing the integrase gene or 
having the integrase protein expressed in the electroporated strain from any constitutive 
1 5 promoter, such as the mgl promoter (see Magrini et al., Jul. 1 999, J. Bad. 181(13): 4062- 
4070, incorporated herein by reference). Once the transposon is constructed, it is 
transposed onto pKOS35-178 to create pKOS35-191. This plasmid is introduced into 
Myxococcus xanthus as described above. This strain contains all the epothilone genes as 

well as all potential promoters. This strain is fermented and tested for the produdion of 
20 epothilones A and B. 

Once the epothilone genes have been established in a strain of Myxococcus 
xanthus, manipulation of any part of the gene cluster, such as changing promoters or 
swapping modules, can be performed using the kanamycin resistance and galK cassette. 

Cultures of Myxococcus xanthus containing the epo genes are grown in a number 
25 of media and examined for production of epothilones. If the levels of production of 

epothilones (in particular B or D) are too low to permit large scale fermentation, the 
AT. xanthus-? reducing clones are subjected to media development and strain improvement, 
as described below for enhancing production in Streptomyces . 

^ Example 4 

Construction of a ■ Streptomyces Expression Vector 
The present invention provides recombinant expression vectors for the 
heterologous expression of modular polyketide synthase genes in Streptomyces hosts. 
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These vectors include expression vectors that employ the actl promoter that is regulated by 
the gene actll ORF4 to allow regulated expression at high levels when growing cells enter 
stationary phase. Among the vectors available are plasmids pRMl and pRM5, and 
derivatives thereof such as pCK7, which are stable, low copy plasmids that carry the 
marker for thiostrepton resistance in actinomycetes. Such plasmids can accommodate 
large inserts of cloned DNA and have been used for the expression of the DEBS PKS in 
S. coelicolor and S. lividems, the picromycin PKS genes in S. lividans, and the 
oleandomycin PKS genes in S. lividans. See U.S. Patent No. 5,712,146. Those of skill in 
the art recognize that S. lividans docs not make the tRNA that recognizes the TTA codon 

for leucine until late-stage growth and that if production of a protein is desired earlier, then 
appropriate codon modifications can be made. 

Pactl 



Plasmid pCK7 

Another vector is a derivative of plasmid pSETl 52 and comprises the actll ORF4- 

Pactl expression system but carries the selectable marker for apramycin resistance. These 

vectors contain the attP site and integrase gene of the actinophage phiC3 1 and do not 

replicate autonomously in Streptomyces hosts but integrate by site specific recombination 

into the chromosome at the attachment site for phiC3 1 after introduction into the cell. 

Derivatives of pCK7 and pSETl 52 have been used together for the heterologous 

production of a polyketide, with different PKS genes expressed from each plasmid. See 

U.S. patent application Serial No. 60/129,73 1, filed 1 6 Apr. 1999, incoiporated herein by 
reference. 
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Plasmid pKOSOlO-153, apSET152 Derivative 

The need to develop expression vectors for the epothilone PKS that function in 

Streptomyces is significant. The epothilone compounds are currently produced in the slow 

5 growing, genetically intractable host Sorangium cellulosum or are made synthetically. The 

streptomycetes, bacteria that produce more than 70% of all known antibiotics and 

important complex polyketidcs, are excellent hosts for production of epothilones and 

epothilone derivatives. S. lividans and S. coelicolor have been developed for the 

expression of heterologous PKS systems. These organisms can stably maintain cloned 

1 0 heterologous PKS genes, express them at high levels under controlled conditions, and 

modify the corresponding PKS proteins (e.g. phosphopantetheinylation) so that they are 

capable of production of the polyketide they encode. Furthermore, these hosts contain the 

necessary pathways to produce the substrates required for polyketide synthesis, e.g. 

malonyl CoA and methylmalonyl CoA. A wide variety of cloning and expression vectors 

1 5 are available for these hosts, as are methods for the introduction and stable maintenance of 

large segments of foreign DNA. Relative to the slow growing Sorangium host, S. lividans 

and S. coelicolor grow well on a number of media and have been adapted for high level 

production of polyketidcs in fermentors. A number of approaches are available for yield 

improvements, including rational approaches to increase expression rates, increase 

20 precursor supply, etc. Empirical methods to increase the titers of the polyketidcs, long 

since proven effective for numerous other polyketides produced in streptomycetes, can 

also be employed for the epothilone and epothilone derivative producing host cells of the 
invention. 
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To produce epothilones by fermentation in a heterologous Slreptomyces host, the 

epothilone PKS (including the NRPS module) genes are cloned in two segments in 

derivatives of pCK7 (loading domain through module 6) and pKOSOlO-153 (modules 7 

through 9). The two plasmids are introduced into S. lividans employing selection for 

thiostrepton and apramycin resistance. In this arrangement, the pCK7 derivative replicates 

autonomously whereas the pKOSOlO-153 derivative is integrated in the chromosome. In 

both vectors, expression of the epothilone genes is from the actl promoter resident within 
the plasmid. 

To facilitate the cloning, the two epothilone PKS encoding segments (one for the 
loading domain through module six and one for modules seven through nine) were cloned 
as translational fusions with the N-teiminal segment of the KS domain of module 5 of the 
cry PKS. High level expression has been demonstrated from this promoter employing KS5 
as the first translated sequence, see Jacobsen et al„ 1998, Biochemistry 37: 4928-4934, 
incorporated herein by reference. A convenient BsaBI site is contained within the DNA 
segment encoding the amino acid sequence EPIAV that is highly conserved in many KS 
domains including the KS-encoding regions oiepoA and of module 7 in epoE. 

The expression vector for the loading domain and modules one through six of the 

epothilone PKS was designated pKOS039- 1 24, and the expression vector for modules 

seven through nine was designated pKOS039-126. Those of skill in the art will recognize 

that other vectors and vector components can be used to make equivalent vectors. Because 

preferred expression vectors of the invention, described below and derived from 

pKOS039-124 and pKOS039-126, have been deposited under the terms of the Budapest 

Treaty, only a summary of the construction of plasmids pKOS039-124 and pKOS039-126 
is provided below. 

The eiyKS5 linker coding sequences were cloned as an -0.4 kb PacI-BgHI 
restriction fragment from plasmid pKOS10-153 into pKOS039-98 to construct plasmid 
PKOS039-1 17. The coding sequences for the eiyKS5 linker were linked to those for the 
epothilone loading domain by inserting the ~8.7 kb EcoRJ-Xbal restriction fragment from 
cosmid pKOS35-70.1A2 into EcoRI-Xbal digested plasmid pLItmus28. The ~3.4 kb of 
BsaBI-Notl and ~3.7 kb Notl-HindUl restriction fragments from the resulting plasmid 
were inserted into BsaBI-HindllT digested plasmid pKOS039-l 1 7 to construct plasmid 
pKOS039-120. The ~7 kb Pacl-Xbal restriction fragment of plasmid pKOS039-l 20 was 
inserted into plasmid pKA018’ to construct plasmid pKOS039-123. The final pKOS039- 
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124 expression vector was constructed by ligating the -34 kb Xbal-AvrII restriction 

fragment of cosmid pKOS35-70.1 A2 with the -21 .1 kb Avrll-Xbal restriction fragment of 
pKOS039-123. 

The plasmid pKOS039-126 expression vector was constructed as follows. First the 
coding sequences for module 7 were linked from cosmids pKOS35-70.4 and pKOS35- 
79.85 by cloning the -6.9 kb Bglll-Notl restriction fragment of pKOS35-70.4 and the -5.9 
kb Notl-HindHI restriction fragment of pKOS35-79.85 into Bglll-Hindlll digested 
plasmid pLitmus28 to construct plasmid pKOS039-l 19. The -12 kb Ndel-Nhcl restriction 
fragment of cosmid pKOS35-79.85 was cloned into Ndel-Xbal digested plasmid 
pKOS039-i 19 to construct plasmid pfCOS039-l22. 

To fuse the eryK.S5 linker coding sequences with the coding sequences for module 

7, the - 1 kb BsaBI-BgHI restriction fragment derived from cosmid pKOS35-70.4 was 

cloned into BsaBI-BclI digested plasmid pKOS039-l 17 to construct plasmid pKOS039- 

1 2 1 . The -21 .5 kb AvrII restriction fragment from plasmid pKOS039- 1 22 was cloned into 

Avrll-Xbal digested plasmid pKOS039-121 to construct plasmid pKOS039-125. The 

-21.8 kb PacI-EcoRI restriction fragment of plasmid pKOS039-125 was ligated with the 

-9 kb Pacl-EcoRI restriction fragment of plasmid pKOS039-44 to construct pKOS039- 
126. 

Plasmids pKOS039-124 and pKOS126 were introduced into 5. lividttns K4-1 14 
sequentially employing selection for the corresponding drag resistance marker. Because 
plasmid pKOS039-126 docs not replicate autonomously in streptomycetes, the selection is 
for cells in which the plasmid has integrated in the chromosome by site-specific 
recombination at the attB site of phiC3 1. Because the plasmid stably integrates, continued 
selection for apramycin resistance is not required. Selection can be maintained if desired. 
The presence of thiostrepton in the medium is maintained to ensure continued selection for 
plasmid pKOS039-124. Plasmids pKOS039-124 and pKOS039-126 were transformed into 
Streptomyces lividans K4-1 14, and transformants containing the plasmids were cultured 

and tested for production of epothilones. Initial tests did not indicate the presence of an 
epothilone. 

To improve production of epothilones from these vectors, the eryKS5 linker 
sequences were replaced by epothilone PKS gene coding sequences, and the vectors were 
introduced into Streptomyces coelicolor CH999. To amplify by PCR coding sequences 
from the epoA gene coding sequence, two oligonucleotides primers were used: 
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N39-73, 5 * -GCTTAATTAAGGAGGACACATATGCCCCTCGTGGCGGATCGTCC-3 1 ; and 
N39-74, 5* -GCGGATCCTCGAATCACCGCCAATATC-3 1 . 

The template DNA was derived from cosmid P KOS35-70.8A3. The ~0.8 kb PCR product 

was digested with restriction enzymes PacI and BamHI and then ligated with the -2.4 kb 

BamHl-Notl and the -6.4 kb Pacl-Notl restriction fragments of plasmid pKOS039-120 to 

construct plasmid pKOS039-136. To make the expression vector for the epoA, epoB, 

epoC, and epoD genes, the -S kb PacI-AvrII restriction fragment of plasmid pKOS039- 

1 j 6 was ligated with the -50 kb PacI-AvrII restriction fragment of plasmid pKOS039-124 

to construct the expression plasmid pKOS039-124R. Plasmid pKOS039-124R has been 

deposited with the ATCC under the terms of the Budapest Treaty and is available under 
accession number . 

To amplify by PCR sequences from the epoE gene coding sequence, two 
oligonucleotide primers were used: 

N39-67A, 5 1 -GCTTAATTAAGGAGCACACATATGACCGACCGAGAAGGCCAGCTC-CTGGA-3 ■ , and 
N39-68, 5 ' -GGACCTAGGC3GGATGCCGGCGTCT-3 ' . 

The template DNA was derived from cosmid pKOS35-70.1 A2. The -0.4 kb 

amplification product was digested with restriction enzymes PacI and AvrII and ligated 

with either the -29.5 kb PacI-AvrII restriction fragment of plasmid pKOS039-126 or the 

-23.8 kb PacI-AvrII restriction fragment of plasmid pKOS039-125 to construct plasmid 

pKOS039-126R or plasmid pKOS039-125R, respectively. Plasmid pKOS039-126R was 

deposited with the ATCC under the terms of the Budapest Treaty and is available under 

accession number . 

The plasmid pair pKOS039-124R and P KOS039-126R (as well as the plasmid pair 
pKOS039-124 and pKOS039-126) contain the full complement o iepoA, epoB, epoC, 
epoD, epoE, epoF, epoK, and epoL genes. The latter two genes are present on plasmid 
PKOS039-126R (as well as plasmid pKOS039-126); however, to ensure that these genes 
were expressed at high levels, another expression vector of the invention, plasmid 

PKOS039-1 41 (Figure 8), was constructed in which the epoK and epoL genes were placed 
under the control of the ermE* promoter. 

The epoK gene sequences were amplified by PCR using the oligonucleotide 
primers: 

N39-69, 5 ' -AGGCATGCATATGACCCAGGAGCARGCGAATCAGAGTG-3 1 ; and 
N39-70, 5 ' -CCAAGCTTTATCCAGCTTTGGAGGGCTTCAAG-3 ’ . 
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The epoL gene sequences were amplified by PCR using the oligonucleotide 

primers: 

N39-71A, 5 1 -GTAAGCTTAGGAGGACACATATGATGCAACTCGCGCGCGGGTG-3 1 ; and 
N39-72, 5 1 -GCCTGCAGGCTCAGGCTTGCGCAGAGCGT- 3 1 . 

5 The template DNA for the amplifications was derived from cosmid pKOS35- 

79.85. The PCR products were subcloned into PCR-script for sequence analysis. Then, the 
ep°K and epoL genes were isolated from the clones as Ndel-HindHI and Hindlll-EcoRI 
restriction fragments, respectively, and ligated with the ~6 kb Ndel-EcoRI restriction 
fragment of plasmid pKOS039- 1 34B, which contains the ermE» promoter, to construct 
1 0 plasmid pKOS039-140. The ~2.4 kb Nhel-PstI restriction fragment of plasmid pKOS039- 
140 was cloned into Xbal-PstI digested plasmid pSAM-Hyg, a plasmid pSAM2 derivative 
containing a hygromycin resistance conferring gene, to construct plasmid pKOS039- 141. 

Another variant of plasmid pKOS039-126R was constructed to provide the epoE 
and epoF genes on an expression vector without the epoK and epoL genes. This plasmid, 

1 5 pKOS045-12 (Figure 9), was constructed as follows. Plasmid pXH106 (described in J. 

Bact., 1991, 173: 5573-5577, incorporated herein by reference) was digested with 
restriction enzymes Stul and BamHI, and the -2.8 kb restriction fragment containing the 
xylE and hygromycin resistance conferring genes was isolated and cloned into EcoRV- 
Bglll digested plasmid P Litmus28. The -2.8 kb Ncol-Avrll restriction fragment of the 
20 resulting plasmid was ligated to the -18 kb Pacl-BspIII restriction fragment of plasmid 

pKOS039- 125R and the -9 kb SpcI-PacI restriction fragment of plasmid pKOS039-42 to 
construct plasmid pKOS045-12. 

To construct an expression vector that comprised only the epoL gene, plasmid 

pKOS039-141 was partially digested with restriction enzyme Ndel, the -9 kb Ndel 

25 restriction fragment was isolated, and the fragment then circularized by ligation to yield 
plasmid pKOS039-150. 

The various expression vectors described above were then transformed into 
Strepwmyces coelicolor CH999 and X lividans K4-1 14 in a variety of combinations, the 
transformed host cells fermented on plates and in liquid culture (R5 medium, which is 
30 identical to R2YE medium without agar). Typical fermentation conditions follow. First, a 
seed culture of about 5 mL containing 50 pg/L Ihiostrepton was inoculated and grown at 
30°C for two days. Then, about 1 to 2 mL of the seed culture was used to inoculate a 
production culture of about 50 mL containing 50 pg/L thiostrepton and 1 mM cysteine, 
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and the production culture was grown at 30°C for 5 days. Also, the seed culture was used 
to prepare plates of cells (the plates contained the same media as the production culture 
with 10 mM propionate), which were grown at 30°C for nine days. 

Certain of the Streptomyces coelicolor cultures and culture broths were analyzed 
5 for production of epothiloncs. The liquid cultures were extracted with three times with 
equal volumes of ethyl acetate, the organic extracts combined and evaporated, and the 
residue dissolved m acetonitrile for LC/MS analysis. The agar plate media was chopped 
and extracted twice with equal volumes of acetone, and the acetone extracts were 
combined and evaporated to an aqueous slurry, which was extracted three limes with equal 
10 vol umes of ethyl acetate. The organic extracts were combined and evaporated, and the 
residue dissolved in acetonitrile for LC/MS analysis. 

Production of epothilones was assessed using LC-mass spectrometry; The output 

flow from the UV detector of an analytical HPLC was split equally between a Pcrkin- 

Elmer/Sciex AP1100LC mass spectrometer and an Alltech 500 evaporative light scattering 

1 5 detector. Samples were injected onto a 4.6 x 1 50 mm reversed phase HPLC column 

(MetaChem 5 m ODS-3 Inertsil) equilibrated in water with a flow rate of 1 .0 mL/min. UV 

detection was set at 250 nm. Sample components were separated using H20 for 1 minute, 

then a linear gradient from 0 to 100% acetonitrile over 10 minutes. Under these 

conditions, epothilone A elutes at 10.2 minutes and epothilone B elutes at 10.5 minutes. 

20 The identity of these compounds was confirmed by the mass spectra obtained using an 

atmospheric chemical ionization source with orifice and ring voltages set at 75 V and 300 

V, respectively, and a mass resolution of 0. 1 amu. Under these conditions, epothilone A 

shows [M+H] at 494.4 amu, with observed fragments at 476.4, 3 1 8.3, and 306.4 amu. 

Epothilone B shows [M+H] at 508.4 amu, with observed fragments at 490.4, 320.3, and 
25 302.4 amu. 

Transformants containing the vector pairs pKOS039-I24R and pKOS039-126Ror 
PKOS039-124 and pKOS039-126R produced detectable amounts of epothilones A and B. 
Transformants containing these plasmid pairs and the additional plasmid pKOS039-141 
produced similar amounts of epothilones A and B, indicating that the additional copies of 
30 the epoK and epoL genes were not required for production under the test conditions 

employed. Thus, these transformants produced epothilones A and B when recombinant 
CP ° A ' epoB ' epaC ' e P° D ’ epoE > e PoF, epoK, and epoL genes were present In some 
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cultures, it was observed that the absence of propionate increased the proportion of 
epothilone B to epothilone A. 

Transformants containing the plasmid pair pKOS039-l24R and pKOS045-12 
produced epothilones C and D, as did transformants containing this plasmid pair and the 
additional plasmid pKOS039-150. These results showed that the epoL gene was not 
required under the test conditions employed to form the C-12-C-13 double bond. These 
results indicate that either the epothilone PKS gene alone is able to form the double bond 
or that Streptomyces coelicolor expresses a gene product able to convert epothilones G and 
H to epothilones C and D. Thus, these transformants produced epothilones C and D when 
recombinant epoA, epoB, epoC, epoD, epoE, and epoF genes were present. 

The heterologous expression of the epothilone PKS described herein is believed to 
represent the recombinant expression of the largest proteins and active enzyme complex 
that have ever been expressed in a recombinant host cell. The epothilone producing 
Streptomyces coelicolor transformants exhibited growth characteristics indicating that 
either the epothilone PKS genes, or their products, or the epothilones inhibited cell growth 
or were somewhat toxic to the cells. Any such inhibition or toxicity could be due to 
accumulation of the epothilones in the cell, and it is believed that the native Sorangium 
producer cells may contain transporter proteins that in effect pump epothilones out of the 
cell. Such transporter genes are believed to be included among the ORFs located 
downstream of the epoK gene and described above. Thus, the present invention provides 
Streptomyces and other host cells that include recombinant genes that encode the products 
of one or more, including all, of the ORFs in this region. 

For example, each ORF can be cloned behind the ermE* promoter, see Stassi et 
al„ 1998, Appl. Microbiol. Biotechnol. 49: 725-731, incorporated herein by reference, in a 
pSAM2-based plasmid that can integrate into the chromosome of Streptomyces coelicolor 
and S. lividans at a site distinct from attB of phage phiC3 1 , see Smokvina et at., 1 990, 

Gene 94: 53-59, incorporated herein by reference. A pSAM2-based vector carrying the 
gene for hygromycin resistance is modified to cany the ermE* promoter along with 
additional cloning sites. Each OR F downstream is PCR cloned into the vector which is 
then introduced into the host cell (also containing pKOS039-124R and pKOS039-126R or 
other expression vectors of the invention) employing hygromycin selection. Clones 

carrying each individual gene downstream from epoK arc analyzed for increased 
production of epothilones. 
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Additional fermentation and strain improvement efforts can be conducted as 
illustrated by the following. The levels of expression of the PKS genes in the various 
constructs can be measured by assaying the levels of the corresponding mRNAs (by 
quantitative RT PCR) relative to the levels of another heterologous PKS mRNA (e.g. 

5 picromycin) produced from genes cloned in similar expression vectors in the same host. If 
one of the epothilone transcripts is underproduced, experiments to enhance its production 
by cloning the corresponding DNA segment in a different expression vector are 
conducted, for example, multiple copies of any one or more of the epothilone PKS genes 
can be introduced into a cell if one or more gene products are rate limiting for 
10 biosynthesis. If the basis for low level production is not related to low level PKS gene 
expression (at the RNA level), an empirical mutagenesis and screening approach that is 
the backbone of yield improvement of every commercially important fermentation product 
is undertaken. Spores are subjected to tfV, X-ray or chemical mutagens, and individual 
survivors are plated and picked and tested for the level of compound produced in small 
1 5 scale fermentations. Although this process can be automated, one can examine several 

thousand isolates for quantifiable epothilone production using the susceptible fungus 
Mucor hiemalis as a test organism. 

Another method to increase the yield of epothilones produced is to change the KS Y 
domain of the loading domain of the epothilone PKS to a KS Q domain. Such altered 
20 loading domains can be constructed in any of a variety of ways, but one illustrative 
method follows. Plasmid pKOS39-124R of the invention can be conveniently used as a 
starting material. To amplify DNA fragments useful in the construction, four 
oligonucleotide primers are employed: 

N39-83; 5 ' -CCGGTATCCACCGCGACACACGGC-3 ' , 

25 N39-84 : 5 * -GCCAGTCGTCCTCGCTCGTGGCCGTTC-3 1 , 

and N39 73 and N39-74, which have been described above. The PCR fragment generated 
with N37-73 and N39-83 and the PCR fragment generated with N39-74 and N39-84 arc 
treated with restriction enzymes PacI and BamHi, respectively, and ligated with the -3.1 
kb Pacl-BamHI fragment of plasmid pKOS39-120 to construct plasmid pKOS039-148. 

30 The -0.8 kb PacI-BamHF restriction fragment of plasmid pKOS039-148 (comprising the 
two PCR amplification products) is ligated with the -2.4 kb BamHI-Notl restriction 
fragment and the -6.4 kb Pacl-Notl restriction fragment of plasmid pKOS39-120 to 
construct pKOS39-136Q. The -5 kb PacI-AvrII restriction fragment of plasmid pKOS039- 
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136Q is ligated to the -50 kb PacI-AvrII restriction fragment of plasmid pKOS039-124 to 
construct plasmid pKOS39-124Q. Plasmids pKOS039-124Q and pKOS039-126R are then 
transformed into Streptomyces coelicolor CH999 for epothilone production. 

The epoA through epoF, optionally with epoK or with cpoK plus epoL, genes 
cloned and expressed are sufficient for the synthesis of epothilone compounds, and the 
distribution of the C-12 H to C-12 methyl congeners appears to be similar to that seen in 
the natural host (A:B::2:1). This ratio reflects that the AT domain of module 4 more 
closely resembles that of the malonyl rather than methylmalonyl specifying AT consensus 
domains. Thus, epothilones D and B are produced at lower quantities than their C-12 
unmethylated counterparts C and A. The invention provides PKS genes that produce 
epothilone D and/or B exclusively. Specifically, methylmalonyl CoA specify ing AT 
domains from a number of sources (e.g. the narbonolide PKS, the rapamycin PKS, and 
others listed above) can be used to replace the naturally occurring at domain in module 4. 
The exchange is performed by direct cloning of the incoming DNA into the appropriate 
site in the epothilone PKS encoding DNA segment or by gene replacement through 
homologous recombination. 

For gene replacement through homologous recombination, the donor sequence to 
be exchanged is placed in a delivery vector between segments of at least 1 kb in length 
that flank the AT domain of epo module 4 encoding DNA. Crossovers in the homologous 
regions result in the exchange of the epo AT4 domain with that on the deliveiy vector. 
Because pKOS039-124 and pKOS039-124R contain AT4 coding sequences, they can be 
used as the host DNA for replacement. The adjacent DNA segments are cloned in one of a 
number of E. coll plasmids that are temperature sensitive for replication. The heterologous 
AT domains can be cloned in these plasmids in the correct orientation between the 
homologous regions as cassettes enabling the ability to perform several AT exchanges 
simultaneously. The reconstructed plasmid (pKOS039-124* or pKOS039-124R*) is tested 
for ability to direct the synthesis of epothilone B and/or by introducing it along with 
pKOS039-126 or pKOS039-l 26R in Streptomyces coelicolor and/or S. lividans. 

Because the titers of the polykctide can vary from strain to strain carrying the 
different gene replacements, the invention provides a number of heterologous 
methylmalonyl CoA specifying AT domains to ensure that production of epothilone D at 
titers equivalent to that of the C and D mixture produced in the Streptomyces coelicolor 
host described above. In addition, larger segments of the donor genes can be used for the 
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replacements, including, in addition to the AT domain, adjacent upstream and downstream 
sequences that correspond to an entire module. If an entire module is used for the 
replacement, the KS, methylmalonyl AT, DH, KR, ACP - encoding DNA segment can be 
obtained from for example and without limitation the DNA encoding the tenth module of 
5 the rapamycin PKS, or the first or fifth modules of the FK-520 PKS. 


Example 5 

Heterologous Expression of EpoK and Conver sion of Enoihilnne p t0 Epothilone n 

This Example describes the construction of E. coli expression vectors for epoK. 

1 0 The epoK gene product was expressed in £ coli as a fusion protein with a polyhistidine 

tag (his tag). The fusion protein was purified and used to convert epothilone D to 
epothilone B. 


15 


20 


Plasmids were constructed to encode fusion proteins composed of six histidine 
residues fiised to either the amino or carboxy terminus of EpoK. The following oligos 
were used to construct the plasmids: 

55-101. a-1: 


^^- l *- at GCACCACCACCACCACCACATGACACAGGAGCAAGCGAAT-CAGAGTG^G-3' 

55-101. b: 

5 ' -AAAAAGGATCCTTAATCCAGCTTTGGAGGGCTT-3 ' , 

55-101. c: 


5 • -AAAAACATATGACACAGGAGCAAGCGAA?-3 ’ , end 
55-101. d: 


25 


30 


-AAAAAGCArCCTTAGTGGT3G7GGTGGTGGTGTCCAGCTTTGGAGGGCTTC-AAGATGAC-3 • . 

The plasmid encoding the amino terminal his tag fusion protein, pKOS55-121, was 
constructed using primers 55-101.a-land 55-101.b, and the one encoding the carboxy' 
tenninal his tag, pKOS55-129, was constructed using primers 55-lOl.c and 55-101.d in 
PCR reactions containing pKOS35-83.5 as the template DNA. Plasmid pKOS35-83.5 
contains the -5 kb Notl fragment comprising the epoK gene ligated into pBluescriptSKII+ 
(Stratagene). The PCR products were cleaved with restriction enzymes BamHI and Ndel 
and ligated into the BamHI and Ndel sites of pET22b (Invitrogen). Both plasmids were 

sequenced to verily that no mutations were introduced during the PCR amplification. 
Protein gels were run as known in the art 

Purification of EpoK was performed as follows. Plasmids pKOS55-121 and 
PKOS55-129 were transformed into BL21(DE3) containing the groELS expressing 
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plasmid pREP4-groFXS (Caspers et al 1994, Cellular and Molecular Biology 40(5): 
635-644). The strains were inoculated into 250 mL of M9 medium supplemented with 2 
mM MgS04, 1% glucose, 20 mg thiamin, 5 mg FeCl 2 , 4 mg CaCl 2 and 50 mg Ievulinic 
acid. The cultures were grown to an OD 6 oo between 0.4 and 0.6, at which point IPTG was 
5 added to 1 mM, and the cultures were allowed to grow for an additional two hours. The 
cells were harvested and frozen at -80°C. The frozen cells were resuspended in 10 ml of 
buffer 1 (5 mM imidazole, 500 mM NaCl, and 45 mM Tris pH 7.6) and were lysed by 
sonicating three times for 1 5 seconds each on setting 8. The cellular debris was pelleted by 
spinning in an SS-34 rotor at 16,000 rpm for 30 minutes. The supernatant was removed 
10 and spun again at 1 6,000 rpm for 30 minutes. The supernatant was loaded onto a 5 mL 
nickel column (Novagen), after which the column was washed with 50 mL of buffer 1 
(Novagen). EpoK was eluted with a gradient from 5 mM to 1M imidazole. Fractions 
containing EpoK were pooled and dialyzed twice against 1 L of dialysis buffer ( 45 mM 
Tris pH7.6, 0.2 mM DTT, 0.1 mM EDTA, and 20% glycerol). Aliquots were frozen in 
1 5 liquid nitrogen and stored at -80°C. The protein preparations were greater than 90% pure. 

The EpoK assay was performed as follows (See Betlach et al , Biochem (1998) 
37:14937. incorporated herein by reference). Briefly, reactions consisted of 50 mM Tris 
(pH7.5), 21 pM spinach ferredoxin, 0.132 units of spinach ferredoxin: NADP + 
oxidoreductasc, 0.8 units of glucose-6-phosphate dehydrogenase, 1 .4 mM NADP, and 7. 1 
20 mM glucose-6-phosphate, 1 00 pM or 200 pM epothilone D (a generous gift of S. 

Danishefsky), and 1 .7 pM amino terminal his tagged EpoK or 1 .6 pM carboxy terminal 
his tagged EpoK in a 100 pL volume. The reactions were incubated at 30°C for 67 
minutes and stopped by heating at 90°C for 2 minutes. The insoluble material was 
removed by centrifugation, and 50 pL of Lhe supernatant were analyzed by LC/MS. HPLC 
25 conditions: Metachem 5 p ODS-3 Inertsil (4.6 X 1 50 mm); 80% H 2 0 for I min, then to 
100% MeCN over 10 min at 1 mL/min, with UV (^=250 rnn), ELSD, and MS 
detection. Under these conditions, epothilone D eluted at 1 1 .6 min and epothilone B at 9.3 
min. the LC/MS spectra were obtained using an atmosphere pressure chemical ionization 
source with orifice and ring voltages set at 20 V and 250 V, respectively, at a mass 
30 resolution of 1 amu. Under these conditions, epothilone E shows an [M+H] at m/z 493, 

with observed fragments at 405 and 304. Epothilone B shows an [M+H] at m/z 509, with 
observed fragments at 491 and 320. 
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The present invention provides methods and reagents for chemobiosynlhesis to 
produce epothilone derivatives in a manner similar to that described to make 6-dEB and 
erythromycin analogs in PCT Pat. Pub. Nos. 99/03986 and 97/02358. Two types of 
feeding substrates are provided: analogs of the NRPS product, and analogs of the module 
3 substrate. The module 2 substrates are used with PKS enzymes with a mutated NRPS- 

like domain, and the module 3 substrates are used with PKS enzymes with a mutated KS 
domain in module 2. 
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The following illustrate module 2 substrates (as N-acetyl cysteamine thioesters) for 
use as substrates for epothilone PKS with modified inactivated NRPS: 
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The module 2 substrates are prepared by activation of the corresponding carboxylic 
5 acid and treatment with N-acetylcysteamine. Activation methods include formation of the 

acid chloride, formation of a mixed anhydride, or reaction with a condensing reagent such 
as a carbodiimide. 
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Exemplary module 3 substrates, also as NAc thioesters for use as substrates for 
cpothilone PKS with KS2 knockout are: 






These compounds are prepared in a three-step process. First, the appropriate 
aldehyde is treated with a Wittig reagent or equivalent to form the substituted acrylic ester. 
The ester is saponified to the acid, which is then activated and treated with N- 
acetylcysteamine. 

Illustrative reaction schemes for making module 2 and module 3 substrates follow. 
Additional compounds suitable for making starting materials for polyketide synthesis by 
the epothilone PKS are shown in Figure 2 as carboxylic acids (or aldehydes that can be 
converted to carboxylic acids) that are converted to the N-acylcysteamides for supplying 
to the host cells of the invention. 

A. Thiophene-3-carhoxylate N-acetylcysteamine thioester 

A solution of thiophene-3-carboxylic acid (128 mg) in 2 mL of dry tetraliydrofimm 
under inert atmosphere was treated with triethylamine (0.25 mL) and diphenylphosphoryl 
azide (0.50 mL). After 1 hour, N-acetylcysteamine (0.25 mL) was added, and the reaction 
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was allowed to proceed for 12 hours. The mixture was poured into water and extracted 
three times with equal volumes of ethyl acetate. The organic extracts were combined, 
washed sequentially with water, 1 N HC1, sat. CuS0 4 , and brine, then dried over MgS0 4> 
filtered, and concentrated under vacuum. Chromatography on Si0 2 using ether followed 
5 by ethyl acetate provided pure product, which crystallized upon standing. 

B. Furan-3-carboxylate N-acetylcysteamine thioester 

A solution of fiiran-3-carboxylic acid (1 12 mg) in 2 mL of dry tetrahydrofuran 
under inert atmosphere was treated with triethylamine (0.25 mL) and diphcnylphosphoryl 
10 azide (0.50 mL). After 1 hour, N-acetylcysteamine (0.25 mL) was added and the reaction 
was allowed to proceed for 1 2 hours. The mixture was poured into water and extracted 
three times with equal volumes of ethyl acetate. The organic extracts were combined, 
washed sequentially with water, l N IICI, sat. CuS0 4 , and brine, then dried over MgS0 4 , 
filtered, and concentrated under vacuum. Chromatography on Si0 2 using ether followed 
1 5 by ethyl acetate provided pure product, which crystallized upon standing. 

C. Pyrrole -2 -carboxylate N-acetylcysteamine thioester 

A solution of pyrroIe-2-carboxylic acid (1 12 mg) in 2 mL of dry tetrahydrofuran 
under inert atmosphere was treated with triethylamine (0.25 mL) and diphenylphosphoryl 
20 azide (0.50 mL). After 1 hour, N-acetylcysteamine (0.25 mL) was added and the reaction 
was allowed to proceed for 12 hours. The mixture was poured into water and extracted 
three times with equal volumes of ethyl acetate. The organic extracts were combined, 
washed sequentially with water, 1 N HC1, sat. CuS0 4 , and brine, then dried over 
MgS0 4 , filtered, and concentrated under vacuum. Chromatography on Si0 2 using ether 
25 followed by ethyl acetate provided pure product, which crystallized upon standing. 

D * 2^Methyl-3-(3-thienyl)acrylate N-acetvlcysteamine thioester 

(1) Ethyl 2 -methyl-3 -(3 -thienyl)acrylate: A mixture of thiophene-3- 
carboxaldehyde (1.12 g) and (carbethoxyethylidene)triphenylphosphorane (4.3 g) in dry- 
30 tetrahydrofuran (20 mL) was heated at reflux for 1 6 hours. The mixture was cooled to 
ambient temperature and concentrated to dryness under vacuum. The solid residue was 
suspended in 1 : 1 ether/hexane and filtered to remove triphenylphosphine oxide. The 
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filtrate was filtered through a pad of Si0 2 using 1:1 ether/hexane to provide the product 
(1.78 g, 91%) as a pale yellow oil. 

(2) 2-Methyl-3-(3-thienyl)acrylic acid: The ester from ( 1 ) was dissolved in a 
mixture of methanol (5 mL) and 8 N KOH (5 mL) and heated at reflux for 30 minutes. The 

5 mixture was cooled to ambient temperature, diluted with water, and washed twice with 
ether. The aqueous phase was acidified using IN HC1 then extracted 3 times with equal 
volumes of ether. The organic extracts w'ere combined, dried with MgSO^, filtered, and 
concentrated to dryness under vacuum. Crystallization from 2: 1 hexane/ether provided the 
product as colorless needles. 

(3) 2-MethyI-3-(3-thienyl)acrylate N-acetylcysteamine thioester: A solution of 
2-Methyl-3-(3-thienyl)acryIic acid (168 mg) in 2 mL of dry tetrahydrofuran under inert 
atmosphere was treated with triethylamine (0.56 mL) and diphenylphosphoryl azide (0.45 
mL). After 15 minutes, N-acctylcysteamine (0.15 mL) is added and the reaction is allowed 
to proceed for 4 hours. The mixture is poured into water and extracted three times with 

15 equal volumes of ethyl acetate. The organic extracts are combined, washed sequentially 
with water, 1 N IIC1, sat. CUSO 4 , and brine, then dried over MgS 04 ,filtered, and 
concentrated under vacuum. Chromatography on Si0 2 using ethyl acetate provided pure 
product, which crystallized upon standing. 

The above compounds are supplied to cultures of host cells containing a 
20 recombinant epothilone PKS of the invention in which either the NRPS or the KS domain 

of module 2 as appropriate has been inactivated by mutation to prepare the corresponding 
epothilone derivative of the invention. 

Example 7 

25 Producing Epothilones and Epothilone Derivatives in Soramium cellulosum SMP 44 

The present invention provides a variety of recombinant Sorangium cellulosum 
host cells that produce less complex mixtures of epothilones than the naturally occurring 
epothilone producers as well as host cells that produce epothilone derivatives. This 
Example illustrates the construction of such strains by describing how to make a strain that 
30 produce only epothilones C and D without epothilones A and B. To construct this strain, 
an inactivating mutation is made in epoK. Using plasmid pKOS35-83.5, which contains a 
Notl fragment harboring the epoK gene, the kanamycin and bleomycin resistance markers 
from Tn5 are ligated into the Seal site of the epoK gene to construct pKOS90-55. The 


55 


WO 00/31247 


- 116- 


PCT/US99/27438 


5 


10 


15 


20 


25 


30 


35 


40 


45 


50 


orientation of the resistance markers is such that transcription initiated at the kanamycin 
promoter drives expression of genes immediately downstream of epoK. In other words, the 
mutation should be nonpolar. Next, the origin of conjugative transfer, oriT, from RP 4 is 
ligated into pKOS90-55 to create pKOS90-63. This plasmid can be introduced into SI 7-1 
5 and conjugated into SMP44. The transconjugants are selected on phleomycin plates as 
previously described. Alternatively, electroporation of the plasmid can be achieved using 
conditions described above for Myxococcus xanthus. 

Because there are three generalized transducing phages for Afyxococcus xanthus, 
one can transfer DNA from M, xanthus to SMP44. First, the epoK mutation is constructed 
10 in M. xanthus by linearizing plasmid pKOS90-55 and electroporating into M. xanthus. 
Kanamycin resistant colonies are selected and have a gene replacement of epoK. This 
strain is infected with Mx9, Mx 8 , Mx4 tsl 8 hft him phages to make phage lysates. These 
lysates arc then individually infected into SMP44 and phleomycin resistant colonies arc 
selected. Once the strain is constructed, standard fermentation procedures, as described 
1 5 below, are employed to produce epothilones C and D. 

Prepare a fresh plate of Sorangium host cells (dispersed) on S42 medium. S42 
medium contains tryptone, 0.5 g/L; MgS0 4 . 1.5 g/L; HBPES, 12 g/L; agar, 12 g/L, with 
deionized water. The pH of S42 medium is set to 7.4 with KOH. To prepare S42 medium, 
after autoclaving at 121 C for at least 30 minutes, add the following ingredients (per liter): 
20 CaCh, 1 g; K 2 HPO 4 , 0.06 g; Fe Citrate, 0.008 g; Glucose, 3.5 g; Ammonium sulfate, 0.5 g; 
Spent liquid medium, 35 mLi and 200 micrograms/mLi of kanamycin is added to prevent 
contamination. Incubate the culture at 32°C for 4-7 days, or until orange sorangia appear 
on the surface. 

To prepare a seed culture for inoculating agar plales/bioreactor, the following 
25 protocol is followed. Scrape off a patch of orange Sorangium cells from the agar (about 5 
mm ) and transfer to a 250 ml baffle flask with 38 mm silicone foam closures containing 
50 ml of Soymeal Medium containing potato starch, 8 g; defatted soybean meal, 2 g; yeast 
extract, 2 g; Iron (III) sodium salt EDTA, 0.008 g; MgS 04 . 7 H 2 0 , 1 g; CaCb^HjO, 1 g; 
glucose, 2 g; HEPES buffer, 1 1.5 g. Use deionized water, and adjust pH to 7.4 with 10% 

30 KOH. Add 2-3 drops of antifoam B to prevent foaming. Incubate in a coffin shaker for 4-5 
days at 30°C and 250 RPM. The culture should appear an orange color. Tin's seed culture 

can be subcultured repeatedly for scale-up to inoculate in the desired volume of production 
medium. 
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The same preparation can be used with Medium 1 containing (per liter) 
CaCI 2 .2H 2 0, 1 g; yeast extract, 2 g; Soytone, 2 g; FcEDTA, 0.008 g; Mg S0 4 .7H 2 0, 1 g; 
HEPES, 1 1.5 g. Adjust pH to 7.4 with 10% KOH, and autoclave at 121°C for 30 minutes. 
Add 8 ml of 40% glucose after sterilization. Instead of a baffle flask, use a 250 ml coiled 
spring flask with a foil cover. Include 2-3 drops of antifoam B, and incubate in a coffin 
shaker for 7 days at 37°C and 250 RPM. Subculture the entire 50 mL into 500 mL of fresh 
medium in a baffled narrow necked Fembach flask with a 38 mm silicone foam closure. 
Include 0.5 ml of antifoam to the culture. Incubate under the same conditions for 2-3 days. 
Use at least a 10% inoculum for a bioreactor fermentation. 

To culture on solid media, the following protocol is used. Prepare agar plates 
containing (per liter of CNS medium) KN0 3 , 0.5 g; Na 2 HP0 4 , 0.25 g; MgS0 4 .7H 2 0, 1 g; 
FeCl 2 , 0.01 g; HEPES, 2.4 g; Agar, 15 g; and sterile Whatman filter paper. While the agar 
is not completely solidified, place a sterile disk of filter paper on the surface. When the 
plate is dry, add just enough of the seed culture to coat the surface evenly (about 1 mL). 

Spread evenly with a sterile loop or an applicator, and place in a 32°C incubator for 7 
days. Harvest plates. 

For production in a 5 L bioreactor, the following protocol is used. The 
fermentation can be conducted in a B. Braun Biostat MD-1 5L bioreactor. Prepare 4 L of 
production medium (same as the soymeal medium for the seed culture without HEPES 
buffer). Add 2% (volume to volume) XAD-16 absorption resin, unwashed and untreated, 
e.g. add 1 mL of XAD per 50 mL of production medium. Use 2.5 N H 2 S0 4 for the acid 
bottle, 10% KOH for the base bottle, and 50% antifoam B for the antifoam bottle. For the 
sample port, be sure that the tubing that will come into contact with the culture broth has a 
small opening to allow the XAD to pass tlirough into the vial for collecting daily samples. 
Stir the mixture completely before autoclaving to evenly distribute the components. 
Calibrate the pH probe and test dissolved oxygen probe to ensure proper functioning. Use 
a small antifoam probe, -3 inches in length. For the bottles, use tubing that can be sterile 
welded, but use silicone tubing for the sample port. Make sure all fittings are secure and 
the tubings are clamped off, not too tightly, with C-clamps. Do not clamp the tubing to the 
exhaust condenser. Attach 0.2 pm filter disks to any open tubing that is in contact with the 
air. Use larger ACRO 50 filter disks for larger tubing, such as the exhaust condenser and 
the air inlet tubing. Prepare a sterile empty bottle for the inoculum. Autoclave at 121°C 
with a sterilization time of 90 minutes. Once the reactor has been taken out of the 
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autoclave, connect the tubing to the acid, base, and antifoam bottles through their 
respective pump heads. Release the clamps to these bottles, making sure the tubing has not 
been welded shut. Attach the temperature probe to the control unit. Allow the reactor to 
cool, while sparging with air through the air inlet at a low air flow rate. 

After ensuring the pumps are working and there is no problem with flow rate or 
clogging, connect the hoses from the water bath to the water jacket and to the exhaust 
condenser. Make sure the water jacket is nearly full. Set the temperature to 32°C. Connect 
pH, D.O., and antifoam probes to the main control unit. Test the antifoam probe for proper 
functioning. Adjust the set point of the culture to 7.4. Set the agitation to 400 RPM. 
Calibrate the D.O. probe using air and nitrogen gas. Adjust the airflow using the rate at 
which the fermentation will operate, e.g. 1 LPM (liter per minute). To control the 
dissolved oxygen level, adjust the parameters under the cascade setting so that agitation 
will compensate for lower levels of air to maintain a D.O. value of 50%. Set the minimum 
and maximum agitation to 400 and 1000 RPM respectively, based on the settings of the 
control unit. Adjust the settings, if necessary. 

Check the seed culture for any contamination before inoculating the fermenter. The 
Sorangium cellulosum cells are rod shaped like a pill, with 2 large distinct circular 
vacuoles at opposite ends of the cell. Length is approximately 5 times that of the width of 
the cell. Use a 10% inoculum (minimum) volume, e.g. 400 mL into 4 L of production 
medium. Take an initial sample from the vessel and check against the bench pH. If the 
difference between the fermenter pH and the bench pH is ofT by > 0.1 units, do a 1 point 
recalibration. Adjust the deadband to 0. 1 . Take daily 25 mL samples noting fermenter pH, 
bench pH, temperature. D.O., airflow, agitation, acid, base, and antifoam levels. Adjust pll 
if necessary. Allow the fermenter to run for seven days before harvesting. 

Extraction and analysis of compounds is performed substantially as described 
above in Example 4. In brief, fermentation culture is extracted twice with ethyl acetate, 
and the ethyl acetate extract is concentrated to dryness and dissolved/suspended in —500 
pL of MeCN-H 2 0 (1:1). The sample is loaded onto a 0.5 mL Bakerbond ODS SPE 
cartridge pre-equilibrated with MeCN-H 2 0 (1:1). The cartridge is washed with 1 mL of 
the same solvent, followed by 2 mL of MeCN. The MeCN eluent is concentrated to 
dryness, and the residue is dissolved in 200 pL of MeCN. Samples (50 pL) are analyzed 
by HPLC/MS on a system comprised of a Beckman System Gold HPLC and PE Sciex 
API100LC single quadrapole MS-based detector equipped with an atmospheric pressure 
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chemical ionization source. Ring and orifice voltages are set to 75V and 300V, 
respectively, and a dual range mass scan from m/z 290-330 and 450-550 is used. HPLC 
conditions: Metachem 5p ODS-3 Inertsil (4.6 X 150 mm); 100% H 2 0 for 1 min, then to 
1 00% MeCN over 1 0 min a 1 mL/min. Epothilone A elutes at 0.2 min under these 
conditions and gives characteristic ions at m/z 494 (M+H), 476 (M+H-H 2 0), 3 1 8, and 306. 

Example 8 

Epothilone Derivatives as Anti-Cancer Agents 
The novel epothilone derivatives shown below by Formula (1) set forth above are 
potent anti-cancer agents and can be used for the treatment of patients with various forms 
of cancer, including but not limited to breast, ovarian, and lung cancers. 

The epothilone structure-activity relationships based on tubulin binding assay arc 
(see Nicolaou et al, 1997, Angew. Chem. lnt. Ed. Engl. 36: 2097-2103, incorporated 
herein by reference) are illustrated by the diagram below. 



A) (3S) configuration important; B) 4,4-ethano group not tolerated; C) (6R, 7S ) 
configuration crucial; D) (8S) configuration important, 8,8-dimethyl group not tolerated; 
E) epoxide not essential for tubulin polymerization activity, but may be important for 
cytotoxicity; epoxide configuration may be important; R group important; both olefin 
geometries tolerated; F) (15S) configuration important; G) bulkier group reduces activity; 
H) oxygen substitution tolerated; I) substitution important; J) heterocycle important. 

Thus, this SAR indicates that modification of the C1-C8 segment of the molecule 
can have strong effects on activity, whereas the remainder of the molecule is relatively 
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tolerant to change. Variation of substituent stereochemistry with the C1-C8 segment, or 
removal of the functionality, can lead to significant loss of activity. Epothilone derivative 
compounds A-H differ from epothilone by modifications in the less sensitive portion of 
the molecule and so possess good biological activity and offer better pharmacokinetic 
characteristics, having improved lipophilic and steric profiles. 

These novel derivatives can be prepared by altering the genes involved in the 
biosynthesis of epothilone optionally followed by chemical modification. The 9-hydroxy- 
epothilone derivatives prepared by genetic engineering can be used to generate the 
carbonate derivatives (compound D) by treatment with triphosgene or 1,1 J 
carbonyldiimidazole in the presence of a base. In a similar manner, the 9,1 1-dihydroxy- 
epothilone derivative, upon proper protection of the C-7 hydroxyl group if it is present, 
yields the carbonate derivatives (compound F). Selective oximation of the 9 oxo- 
epothilonc derivatives with hydroxylamine followed by reduction (Raney nickel in the 
presence of hydrogen or sodium cyanoborohydride) yield the 9-amino analogs. Reacting 
these 9-amino derivatives with p-nitrophenyl chloro formate in the presence of base and 
subsequently reacting with sodium hydride will produce the carbamate derivatives 
(compound E). Similarly, the carbamate compound G, upon proper protection of the C7 

hydroxyl group if it is present, can be prepared form the 9-amino-l 1 hydroxy-epothilone 
derivatives. 

Illustrative syntheses are provided below. 

Part A. Epothilone D -7, 9-cyclic carbonate 

To a round bottom flask, a solution of 254 mg epothilone D in 5 mL of methylene 
chloride is added. It is cooled by an ice bath, and 0.3 mL of triethyl amine is then added. 
To this solution, 104 mg of triphosgene is added. The ice bath is removed, and the mixture 
is stirred under nitrogen for 5 hours. The solution is diluted with 20 mL of methylene 
chloride and washed with dilute sodium bicarbonate solution. The organic solution is dried 
over magnesium sulfate and filtered. Upon evaporation to dryness, the epothilone D-7, 9 - 
cyclic carbonate is isolated. 

Part B. Epothilone D-7,9-cyclic carbamate 
(i) 9-amino-epothilone D 

To a rounded bottom flask, a solution of 252 mg 9-oxo-epothilone D in 5 mL of 
methanol is added. Upon the addition of 0.5 mL 50% hydroxylamine in water and 0.1 mL 
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acetic acid, the mixture is stirred at room temperature overnight. The solvent is then 
removed under reduced pressure to yield the 9-oxime-epothilone D. To a solution of this 9 
oxime compound in 5 mL of tetrahydrofuran (THF) at ice bath is added 0.25 mL 1M 
solution of cyanoborohydride in THF. After the mixture is allowed to react for 1 hour, the 
5 ice bath is removed, and the solution is allowed to warm slowly to room temperature. One 
mL of acetic acid is added, and the solvent is then removed under reduced pressure. The 
residue is dissolved in 30 mL of methylene chloride and washed with saturated sodium 
chloride solution. The organic layer is separated and dried over magnesium sulfate and 
filtered. Upon evaporation of the solvent yields the 9-amino-epothilone D. 

10 (ii) Epothi lone D-7,9-cyclic carbamate 

To a solution of 250 mg of 9-amino-epothilone D in 5 mL of methylene is added 
1 10 mg of 4-nitrophenyI chloroformate followed by the addition of 1 mL of triethylamine. 
The solution is stirred at room temperature for 16 hours. It is diluted with 25 mL of 
methylene chloride. The solution is washed with saturated sodium chloride and the organic 
1 5 layer is separated and dried over magnesium sulfate. After filtration, the solution is 

evaporated to dryness at reduced pressure. The residue is dissolved in 10 mL of dry THF. 
Sodium hydride, 40 mg (60% dispersion in mineral oil), is added to the solution in an ice 
bath. The ice bath is removed, and the mixture is stirred for 1 6 hours. One-half mL of 
acetic acid is added, and the solution is evaporated to dryness under reduced pressure. The 
20 residue is re-dissolved in 50 mL methylene chloride and washed with saturated sodium 
chloride solution. The organic layer is dried over magnesium sulfate and the solution is 
filtered and the organic solvent is evaporated to dryness under reduced pressure. Upon 
purification on silica gel column, the epothilone D-7,9-carbamate is isolated. 

25 The invention having now been described by way of written description and 

examples, those of skill in the art will recognize that the invention can be practiced in a 
variety of embodiments and that the foregoing description and examples are for purposes 
of illustration and not limitation of the following claims. 
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1 . An isolated recombinant nucleic acid compound that comprises a 
nucleotide sequence encoding at least a domain of an epothilone polyketide synthase 

5 (PKS) protein and/or encoding a functional region of an epothilone modification enzyme. 

2. The nucleic acid of claim ] , wherein said domain is selected from the group 
consisting of a loading domain, a thioesterase domain, an NRPS, an AT domain, a KS 
domain, an ACP domain, a KR domain, a DH domain, and an ER domain, a methyl 

1 0 transferase domain and a functional oxidase domain. 


3. The nucleic acid of claim 1 or 2 that comprises the coding sequence of an 
epoA gene, and/or 

the coding sequence of an epoB gene, and/or 
the coding sequence of an epoC gene, and/or 
the coding sequence of an cpoD gene, and/or 
the coding sequence of an epoE gene, and/or 
the coding sequence of an epaFgene, and/or 
the coding sequence of an epoK gene, and/or 
the coding sequence of an epoL gene. 


The nucleic acid of any of claims 1-3 that further comprises a promoter 
positioned to transcribe said encoding nucleotide sequence in host cells in which said 
promoter is operable. 

25 


5. The nucleic acid of claim 4, wherein said promoter is a promoter from a 
Sorangium gene, or 

from a Myxococcus gene, or 

from a Streptomyces gene, or 

from an epothilone PKS gene, or 

from a pilA gene, or 

from an actinorhodin PKS gene. 
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6. The nucleic acid of any of claims 1-5 that is a recombinant DNA 
expression vector. 


5 


Host cells which contain the nucleic acid of any of claims 4-6. 


8. The cells of claim 7 which are Sorangium cells, or 
Myxococcus cells, or 
Pseudomonas cells, or 
Streptomyces cells. 

10 


9. A method to produce a polyketide which method comprises culturing the 
cells of claim 7 or 8 under conditions wherein the encoding nucleotide sequence is 
expressed to obtain a functional PKS. 


15 10- A recombinant Sorangium ceilulosum host cell that contains a mutated 

gene for an epothilone PKS protein or epothilone modification enzyme, wherein said 
mutated gene was inserted in whole or in part into genomic DNA of said cell by 
homologous recombination with a recombinant vector comprising all or a part of an 
epothilone PKS gene or epothilone modification gene. 

20 

11. The recombinant host cell of claim 10 that 

makes epothilone C or D but not A or B due to a mutation inactivating or deleting 
an epoK gene, or 

makes epothilone A or C but not B or D due to a mutation in epoD altering module 

25 4 AT domain specificity, or 

makes epothilone B or D but not A or C due to a mutation in epoD altering module 
4 AT domain specificity, or 

makes epothilone C but not epothilone A, B or D due to a mutation in epoD 
altering module 4 AT domain specificity and a mutation in epoK, or 

^ ma ^ es e P°thilone D but not epothilone A, B or C due to a mutation in epoD 

altering module 4 AT domain specificity and a mutation in epoK. 
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1 2. Recombinant Streptomyces or Myxococcus host cells that express an 
epothilone PKS gene or an epothilone modification enzyme gene, optionally comprising 
one or more of said epothilone PKS or modification enzyme genes integrated into their 
chromosomal DNA and/or one or more of said epothilone PKS or modification enzyme 

5 genes on an extrachromosomal expression vector. 

13. The host cells of claim 12 or 13 that are S. coelicolor CH999. 

14. A method to produce an epothilone or epothilone derivative which 
10 comprises culturing the cells of claims 12 or 13. 


15. A modified functional epothilone PKS wherein said modification 
comprises at least one of: 

replacement of at least one AT domain with an AT domain of different specificity; 
inactivation of the NRPS-like module 1 or of the KS2 catalytic domain; 
inactivation of at least one activity in at least one P-caibonyl modification domain; 

addition of at least one of KR, DII and ER activity in at least one P-carbonyl 

modification domain; and 

replacement of the NRPS module 1 with an NRPS of different specificity. 


16. The modified PKS of claim 1 5 contained in a cell or contained in a cell-free 
system, wherein said cell or system contains additional enzymes for modification of the 
product of said epothilone PKS. 


25 1 7. The modified PKS of claim 1 6 wherein said modifying enzymes comprise 

at least one of a methyltransferase, an oxidase or a glycosylation enzyme. 


45 


50 


18. A method to prepare an epothilone derivative which method comprises 
providing substrates including extender units to the modified PKS of any of claims 15-17. 

19. A modified functional epothilone PKS wherein said modification 
comprises inactivation of the NRPS of module 1 or the KS2 of module 2 thereof. 
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20. A method to make an epothilone derivative which method comprises 

contacting the modified PKS of claim 19 with a module 2 substrate or a module 3 
substrate and extender units. 

5 21 . Recombinant host cells which comprise the modified PKS of any of claims 

15-17 or 19. 

22. The cells of claim 21 that produce an epothilone derivative selected from 
the group consisting of 16-desmethyl epothilones, 14-methyl epothilones, 1 1-hydroxyl 

1 0 epothilones, 10-methyl epothilones, 8,9-anhydro epothilones, 9-hydroxyl epothilones, 9- 
keto epothilones, 8-desmcthyl epothilones, and 6-dcsmethyl epothilones. 

23 . A compound selected from the group consisting of 1 6-desmethy 1 
epothilones, 14-methyl epothilones, 11 -hydroxyl epothilones, 10-methyl epothilones, 8,9- 

1 5 anhydro epothilones, 9-hydroxyl epothilones, 9-keto epothilones, 8-desmethyl epothilones, 
and 6-desmethyI epothilones. 

24. A recombinant PKS enzyme that comprises one or more domains, modules, 

or proteins of a non-epothilone PKS and one or more domains, modules, or proteins of an 
20 epothilone PKS, and/or 

contains a loading domain that comprises a KS° domain. 

25. The PKS en2yme of claim 24, wherein 

said PKS comprises a DEBS loading domain and 5 modules of DEBS and an 
NRPS of the epothilone PKS, 

25 wherein said PKS comprises all of a nun-epothilone PKS with an MT domain of 

the epothilone PKS 
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26. A compound of the formula: 



including the glycosylated forms thereof and stereoisomeric forms where the 
5 stereochemistry is not shown, 

wherein A is a substituted or unsubstituted straight, branched chain or cyclic alkyl, 
alkenyl or alkynyl residue optionally containing 1-3 heteroatoms selected from O, S and 
N; or wherein A comprises a substituted or unsubstituted aromatic residue; 

R represents H,H, or HJower alkyl, or lower alkyl ,Iower alkyl; 

1 0 X 5 represents =0 or a derivative thereof, or H,OH or H,NR 2 wherein R is H, alkyl 

or acyl, or H,OCOR 2 , H,OCONR 2 wherein R is H or alkyl, or is H,H; 

R 6 represents H or lower alkyl, and the remaining substituent on the corresponding 
carbon is H; 

X represents OR, or NR 2 , wherein R is H, alkyl or acyl or is OCOR, or OCONR 2 
1 5 wherein R is H or alkyl or X 7 taken together with X 9 forms a carbonate or carbamate 
cycle, and w'herein the remaining substituent on the corresponding carbon is H; 

R represents H or lower alkyl and the remaining substituent on the carbon is H; 

X 9 represents =0 or a derivative thereof, or H,OR or H,NR 2 wherein R is H, alkyl 
or acyl, or is H.OCOR or H,OCONR 2 , wherein R is H or alkyl, or represents H,H or 
20 wherein X 9 together with X 7 or with X 1 1 can form a cyclic carbonate or carbamate; 

R 10 is H,H or H,lower alkyl, or lower alkyl, lower alkyl; 

X !l is =0 or a derivative thereof, or H,OR, or H,NR 2 wherein R is H, alkyl or acyl 
or H,OCOR or H,OCONR 2 wherein R is H or alkyl, or is H,H or wherein X n in 
combination with X 9 may form a cyclic carbonate or carbamate; 

25 R 12 »s H,H, or H,lower alkyl, or lower alkyl,lower alkyl; 

X 13 is =0 or a derivative thereof, or H,OR or H,NR 2 wherein R is H, alkyl or acyl 
or is H,OCOR or H,OCONR 2 wherein R is H or alkyl; 
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R is H,H, or H, lower alkyl, or tower alkyl, lower alkyl; 

R 16 is H or lower alkyl; and 

wherein optionally H or another substituent may be removed from positions 12 and 

1 3 and/or 8 and 9 to form a double bond, wherein said double bond may optionally be 
converted to an epoxide. 
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1(d), 

and 



30 


wherein both Z are O or one Z is N and the other Z is O and the remaining substituents an; 
defined as in claim 26. 


28. A recombinant vector selected from the group consisting of pKOS35- 
10 70.8A3, pKOS35-70.1A2, pKOS35-70.4, pKOS35-79.85, pKOS039-124R, and 

pKOS039-126R. 
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